diff mbox

[RFC,v0,05/49] pnfsd: introduce pnfsd header files

Message ID 1380220810-12909-1-git-send-email-bhalevy@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Benny Halevy Sept. 26, 2013, 6:40 p.m. UTC
From: Andy Adamson <andros@umich.edu>

pnfsd data structures used internally and over the export API.

[extracted from pnfsd: Initial pNFS server implementation.]
[pnfsd: remove CONFIG_PNFSD from nfsd4_pnfs.h]
Signed-off-by: Andy Adamson <andros@netapp.com>
[moved {include/linux,fs}/nfsd/pnfsd.h]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@primarydata.com>
---
 fs/nfsd/pnfsd.h                 | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/nfsd/nfsd4_pnfs.h | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)
 create mode 100644 fs/nfsd/pnfsd.h
 create mode 100644 include/linux/nfsd/nfsd4_pnfs.h

Comments

Christoph Hellwig Sept. 29, 2013, 11:43 a.m. UTC | #1
Empty header are pretty useless.  Also why would you want a header
outside fs/nfsd/ ?

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Sept. 29, 2013, 12:12 p.m. UTC | #2
On 2013-09-29 14:43, Christoph Hellwig wrote:
> Empty header are pretty useless.

Right. This was useful early on while we re-ordered the patches that followed.
No need for that now so I'll squash this patch accordingly.

> Also why would you want a header
> outside fs/nfsd/ ?

This header contains the file system interface.

Benny

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Sept. 29, 2013, 12:13 p.m. UTC | #3
On Sun, Sep 29, 2013 at 03:12:41PM +0300, Benny Halevy wrote:
> > Also why would you want a header
> > outside fs/nfsd/ ?
> 
> This header contains the file system interface.

Any interface for the filesystem should be part of exportfs.h, not
something nfs-specific.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Sept. 29, 2013, 12:20 p.m. UTC | #4
On 2013-09-29 15:13, Christoph Hellwig wrote:
> On Sun, Sep 29, 2013 at 03:12:41PM +0300, Benny Halevy wrote:
>>> Also why would you want a header
>>> outside fs/nfsd/ ?
>>
>> This header contains the file system interface.
> 
> Any interface for the filesystem should be part of exportfs.h, not
> something nfs-specific.

Makes sense.  Thanks.

Bruce - are you ok with moving the pnfs interface definitions to
include/linux/exportfs.h along with struct export_operations?

In fact we can actually extend struct export_operations rather
than adding pnfs_export_operations...

Benny

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Sept. 29, 2013, 12:21 p.m. UTC | #5
On Sun, Sep 29, 2013 at 03:20:33PM +0300, Benny Halevy wrote:
> Bruce - are you ok with moving the pnfs interface definitions to
> include/linux/exportfs.h along with struct export_operations?
> 
> In fact we can actually extend struct export_operations rather
> than adding pnfs_export_operations...

Yes, it probably should go into the export ops, although the actual
method signatures might need to be made a litle less nfs-specific for
that.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Sept. 29, 2013, 12:35 p.m. UTC | #6
On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
> > Bruce - are you ok with moving the pnfs interface definitions to
> > include/linux/exportfs.h along with struct export_operations?
> > 
> > In fact we can actually extend struct export_operations rather
> > than adding pnfs_export_operations...
> 
> Yes, it probably should go into the export ops, although the actual
> method signatures might need to be made a litle less nfs-specific for
> that.

I jsut took a brief look over the diff for the whole series in the git
tree and the old tree that still had block and exofs servers and have
revised my opinion a little bit:


 - the should be a layout_type field in struct export_operations,
   indicating that a filesystem support some sort of pnfs-like export.
 - there should be a struct pnfs_operations, but it should be confined
   to fs/nfsd: each layout can be a separate loadable module and gets
   registered there.  For the initial file layout that module is
   self-contained, but for e.g. block or objects it would have
   call into the filesystem through export_ops, although way lower level
   than the NFS XDR level, e.g. for block there would be one of to get
   the extent map, and one to allocate an extent.

This way we alsod avoid the dependcy on nfsd in the filesystems that the
cureent version introduces.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Sept. 30, 2013, 3:23 p.m. UTC | #7
On 2013-09-29 15:35, Christoph Hellwig wrote:
> On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
>>> Bruce - are you ok with moving the pnfs interface definitions to
>>> include/linux/exportfs.h along with struct export_operations?
>>>
>>> In fact we can actually extend struct export_operations rather
>>> than adding pnfs_export_operations...
>>
>> Yes, it probably should go into the export ops, although the actual
>> method signatures might need to be made a litle less nfs-specific for
>> that.
> 
> I jsut took a brief look over the diff for the whole series in the git
> tree and the old tree that still had block and exofs servers and have
> revised my opinion a little bit:
> 
> 
>  - the should be a layout_type field in struct export_operations,
>    indicating that a filesystem support some sort of pnfs-like export.
>  - there should be a struct pnfs_operations, but it should be confined
>    to fs/nfsd: each layout can be a separate loadable module and gets
>    registered there.  For the initial file layout that module is
>    self-contained, but for e.g. block or objects it would have
>    call into the filesystem through export_ops, although way lower level

This makes sense for blocks for its use of the generic block allocation and mapping
calls (and it needs a new call for committing uninitialized extents)
But for objects there are no such calls and the integration with exofs
is pretty intimate.

Benny

>    than the NFS XDR level, e.g. for block there would be one of to get
>    the extent map, and one to allocate an extent.
> 
> This way we alsod avoid the dependcy on nfsd in the filesystems that the
> cureent version introduces.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh Oct. 1, 2013, 1:05 a.m. UTC | #8
On 09/29/2013 05:35 AM, Christoph Hellwig wrote:> On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
>>> Bruce - are you ok with moving the pnfs interface definitions to
>>> include/linux/exportfs.h along with struct export_operations?
>>>
>>> In fact we can actually extend struct export_operations rather
>>> than adding pnfs_export_operations...
>>
>> Yes, it probably should go into the export ops, although the actual
>> method signatures might need to be made a litle less nfs-specific for
>> that.
>
> I jsut took a brief look over the diff for the whole series in the git
> tree and the old tree that still had block and exofs servers and have
> revised my opinion a little bit:
>
>
>  - the should be a layout_type field in struct export_operations,
>    indicating that a filesystem support some sort of pnfs-like export.

The pnfs protocol and people have plans to, allow a multi typed
layouts from the same super-block. It is a per file attribute.
It even allows a multi protocol access to the same file.
The only flag should be the presence of the layout_get vector
that should indicate support or lack of it.

(In fact I would remove layout_type field completly it's place
 is only as an input to LO_GET and DEVICE_INFO)

>  - there should be a struct pnfs_operations, but it should be confined
>    to fs/nfsd: each layout can be a separate loadable module and gets
>    registered there.  For the initial file layout that module is
>    self-contained, but for e.g. block or objects it would have
>    call into the filesystem through export_ops, although way lower level
>    than the NFS XDR level, e.g. for block there would be one of to get
>    the extent map, and one to allocate an extent.
>

No! This does not make any sense. What you say does not fit any model of any
cluster filesystem today.

- Again the FS can support any protocol.
- Only the FS understand the structure and layout of the file access. Any
  other model is a specific implementation and breaks abstraction. The only true
  abstraction is the LO_GET LO_RETURN LO_COMMIT DEVICE_INFO and LO_CB_RECALL. anything
  else is making assumptions.

There is a pnfs vector and it is at this abstraction level exactly.

> This way we alsod avoid the dependcy on nfsd in the filesystems that the
> cureent version introduces.

There is no "dependency on nfsd in the filesystems"

The only dependency the FS has is an import of some library routines
at exportfs that take an abstract layout and device descriptions and encode
them into an XDR buffer. But the FS knows nothing of the XDR and the
NFSD is free to unload at any moment without forcing the FS to unload
first or at all.
This is actually tested, in fact I do this all the time when I want to
start fresh and have NFSD close all resources on the FS. 

Nothing changed, the FS is independent and NFSD is dependent on the FS,
but in an abstract way via an exports vector.

Where did you see such dependency?


> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


Cheers
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Boaz Harrosh Oct. 1, 2013, 1:41 a.m. UTC | #9
On 09/29/2013 05:20 AM, Benny Halevy wrote:
> Makes sense.  Thanks.
>
> Bruce - are you ok with moving the pnfs interface definitions to
> include/linux/exportfs.h along with struct export_operations?
>

I disagree this is a bloat. It is big enough as it is. For code clarity
and maintenance we should split. Yes they are related but then
lots of stuff are related but we want to keep them separate compact
 and readable. why put everything in the same file, it does not make
any sense.

You have a library exportfs that exports a few related but different
interfaces, In fact this header is not exported from exportfs it is
the type system that defines the pnfs operations. They are so
big and verbose they call for a separate header.

> In fact we can actually extend struct export_operations rather
> than adding pnfs_export_operations...
>

This is a great waist of space. Any FS that does not support
pnfs, Which is currently all but exofs, will have 7 NULLs instead
of one.
And putting a struct pnfs_export_operations pointer inside
struct export_operations gives you nothing but an extra dereference
and funny looking code. Current system is just the most simple and
most efficient. Why the extra complexity?

> Benny

Cheers
Boaz


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 1, 2013, 1:19 p.m. UTC | #10
On Mon, Sep 30, 2013 at 06:23:42PM +0300, Benny Halevy wrote:
> This makes sense for blocks for its use of the generic block allocation and mapping
> calls (and it needs a new call for committing uninitialized extents)
> But for objects there are no such calls and the integration with exofs
> is pretty intimate.

That's just because there is no proper split between exofs and the
pnfsd-objects layer.  The split between the two doesn't seem too hard
and would dramatically improve the interface.

With that and moving the recall handling on truncate to generic code
where it belongs almost nothin pnfs-specific will be left in exofs.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 1, 2013, 1:33 p.m. UTC | #11
On Mon, Sep 30, 2013 at 06:05:18PM -0700, Boaz Harrosh wrote:
> The pnfs protocol and people have plans to, allow a multi typed
> layouts from the same super-block. It is a per file attribute.
> It even allows a multi protocol access to the same file.
> The only flag should be the presence of the layout_get vector
> that should indicate support or lack of it.

The current method doesn't help with that as it can return a single type
only anyway.  So in principle I agree with you, but the way to fix it is
not to keep the method, but to make sure it returns a bitmap of
supported layouts.

> >  - there should be a struct pnfs_operations, but it should be confined
> >    to fs/nfsd: each layout can be a separate loadable module and gets
> >    registered there.  For the initial file layout that module is
> >    self-contained, but for e.g. block or objects it would have
> >    call into the filesystem through export_ops, although way lower level
> >    than the NFS XDR level, e.g. for block there would be one of to get
> >    the extent map, and one to allocate an extent.
> >
> 
> No! This does not make any sense. What you say does not fit any model of any
> cluster filesystem today.
> 
> - Again the FS can support any protocol.
> - Only the FS understand the structure and layout of the file access. Any
>   other model is a specific implementation and breaks abstraction. The only true
>   abstraction is the LO_GET LO_RETURN LO_COMMIT DEVICE_INFO and LO_CB_RECALL. anything
>   else is making assumptions.
> 
> There is a pnfs vector and it is at this abstraction level exactly.

No, the problem is that the pnfs_export_operations are entirely at the
wrong level, as I tried to explain.  The right level is very different
for the different layouts:

 - for files it needs to boil down to a:

	 - get a list of devices
	 - given an inode/offset return the layout

 - for block it's get a block map for a file / create an unwritten
   extent / convert it to written

 - for object it seems (not too familar):

    - get a list of devices for this fs
    - given an inode/offset return the layout
    - tell the fs that I/O has finished

As all the layouts operate on different data structures it makes sense
to make the methods operate on those, and keep the boilerplate code
including the XDR encoding/decoding in one single place.

Now how these pnfsd object layout drivers communicate with the fs I
don't have an opinion on until we see the actual code, maybe we need a
pnfs_<layout>_ops if it's complicated enough.  For the files case that
can just call into dlm directly currently as we have no other
interesting cluster fs in tree it's a mood point.  For block it's simple
enough that I'd just add it to export_ops, if not by that time we redo
the current get_blocks mess in a way that we can simply piggyback it on
that which would be even easier.

> > This way we alsod avoid the dependcy on nfsd in the filesystems that the
> > cureent version introduces.
> 
> There is no "dependency on nfsd in the filesystems"

The patchset as pulled will created a depency on nfsd.ko from gfs2.ko

> The only dependency the FS has is an import of some library routines
> at exportfs that take an abstract layout and device descriptions and encode
> them into an XDR buffer. But the FS knows nothing of the XDR and the
> NFSD is free to unload at any moment without forcing the FS to unload
> first or at all.
> This is actually tested, in fact I do this all the time when I want to
> start fresh and have NFSD close all resources on the FS. 

That's not what happens with the file layout as posted, and it's not
something I want to see happen every, btw.  In Linux we're all about
proper abstractions, and letting a fs control all pnfs aspects directly
instead of having common code is a receipe for tons of copy & pasted
code full of different bugs if we ever get additional implementations of
a layout.  Not that I really expect any in tree as all the other
"interested partied" have shown to be leechers that just want to keep
their filesystems out of tree and most of the time as illegal propritary
modules anyway.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Oct. 1, 2013, 7:43 p.m. UTC | #12
On Sun, Sep 29, 2013 at 03:20:33PM +0300, Benny Halevy wrote:
> On 2013-09-29 15:13, Christoph Hellwig wrote:
> > On Sun, Sep 29, 2013 at 03:12:41PM +0300, Benny Halevy wrote:
> >>> Also why would you want a header
> >>> outside fs/nfsd/ ?
> >>
> >> This header contains the file system interface.
> > 
> > Any interface for the filesystem should be part of exportfs.h, not
> > something nfs-specific.
> 
> Makes sense.  Thanks.
> 
> Bruce - are you ok with moving the pnfs interface definitions to
> include/linux/exportfs.h along with struct export_operations?

Fine by me.--b.

> 
> In fact we can actually extend struct export_operations rather
> than adding pnfs_export_operations...
> 
> Benny
> 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields Oct. 1, 2013, 8:30 p.m. UTC | #13
On Sun, Sep 29, 2013 at 05:35:53AM -0700, Christoph Hellwig wrote:
> On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
> > > Bruce - are you ok with moving the pnfs interface definitions to
> > > include/linux/exportfs.h along with struct export_operations?
> > > 
> > > In fact we can actually extend struct export_operations rather
> > > than adding pnfs_export_operations...
> > 
> > Yes, it probably should go into the export ops, although the actual
> > method signatures might need to be made a litle less nfs-specific for
> > that.
> 
> I jsut took a brief look over the diff for the whole series in the git
> tree and the old tree that still had block and exofs servers and have
> revised my opinion a little bit:
> 
> 
>  - the should be a layout_type field in struct export_operations,
>    indicating that a filesystem support some sort of pnfs-like export.
>  - there should be a struct pnfs_operations, but it should be confined
>    to fs/nfsd: each layout can be a separate loadable module and gets
>    registered there.  For the initial file layout that module is
>    self-contained, but for e.g. block or objects it would have
>    call into the filesystem through export_ops, although way lower level
>    than the NFS XDR level, e.g. for block there would be one of to get
>    the extent map, and one to allocate an extent.

That sounds OK to me.

My (possibly faulty) memory of how the xdr-encoding-in-the-fs came
about: I think people weren't willing to commit to any reasonable upper
limit on the size of the layout.  So they originally considered
something more like readdir that would loop over extents and pass them
to a callback for encoding.  But xdr isn't complicated so you could
instead give the filesystem a simple library of xdr encoders to call.

But it sounds like that's not exactly what got implemented.  And maybe
nobody actually has such big layouts.

> This way we alsod avoid the dependcy on nfsd in the filesystems that the
> cureent version introduces.

Ugh--thanks for catching that.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 2, 2013, 11:35 a.m. UTC | #14
On 2013-10-01 16:33, Christoph Hellwig wrote:
> On Mon, Sep 30, 2013 at 06:05:18PM -0700, Boaz Harrosh wrote:
>> The pnfs protocol and people have plans to, allow a multi typed
>> layouts from the same super-block. It is a per file attribute.
>> It even allows a multi protocol access to the same file.
>> The only flag should be the presence of the layout_get vector
>> that should indicate support or lack of it.
> 
> The current method doesn't help with that as it can return a single type
> only anyway.  So in principle I agree with you, but the way to fix it is
> not to keep the method, but to make sure it returns a bitmap of
> supported layouts.
> 
>>>  - there should be a struct pnfs_operations, but it should be confined
>>>    to fs/nfsd: each layout can be a separate loadable module and gets
>>>    registered there.  For the initial file layout that module is
>>>    self-contained, but for e.g. block or objects it would have
>>>    call into the filesystem through export_ops, although way lower level
>>>    than the NFS XDR level, e.g. for block there would be one of to get
>>>    the extent map, and one to allocate an extent.
>>>
>>
>> No! This does not make any sense. What you say does not fit any model of any
>> cluster filesystem today.
>>
>> - Again the FS can support any protocol.
>> - Only the FS understand the structure and layout of the file access. Any
>>   other model is a specific implementation and breaks abstraction. The only true
>>   abstraction is the LO_GET LO_RETURN LO_COMMIT DEVICE_INFO and LO_CB_RECALL. anything
>>   else is making assumptions.
>>
>> There is a pnfs vector and it is at this abstraction level exactly.
> 
> No, the problem is that the pnfs_export_operations are entirely at the
> wrong level, as I tried to explain.  The right level is very different
> for the different layouts:
> 
>  - for files it needs to boil down to a:
> 
> 	 - get a list of devices
> 	 - given an inode/offset return the layout
> 

For loosely clustered files-layout file systems the MDS would also
need to handle LAYOUTCOMMIT, and also generate and handle layout recalls,
depending on how much of that we can do in a generic way and how much is
file system specific.

>  - for block it's get a block map for a file / create an unwritten
>    extent / convert it to written

Besides converting to written there is also de-allocation of provisionally
allocated extents and more advanced stuff for client assisted copy-on-write
where after allocation of uninitialized extents the client can commit
extents to the block map by replacing existing initialized extents with
new ones.

> 
>  - for object it seems (not too familar):
> 
>     - get a list of devices for this fs
>     - given an inode/offset return the layout
>     - tell the fs that I/O has finished
> 
> As all the layouts operate on different data structures it makes sense
> to make the methods operate on those, and keep the boilerplate code
> including the XDR encoding/decoding in one single place.
> 
> Now how these pnfsd object layout drivers communicate with the fs I
> don't have an opinion on until we see the actual code, maybe we need a
> pnfs_<layout>_ops if it's complicated enough.

The original design this patchset builds on called for implementing the
layout type specifics as library code and let file systems supporting pnfs
implement the high level pnfs methods and use the library for encoding
and decoding layout-type specific XDR and use other helpers if needed.

You're calling for a "back-end layout driver" kind of layer that will serve
the layout type and will call into the file system using lower level methods.

>  For the files case that
> can just call into dlm directly currently as we have no other
> interesting cluster fs in tree it's a mood point.  For block it's simple
> enough that I'd just add it to export_ops, if not by that time we redo
> the current get_blocks mess in a way that we can simply piggyback it on
> that which would be even easier.
> 

On the same lines, I think that for now calling directly into exofs makes
the most sense as we have no other object based pnfs implementation.

>>> This way we alsod avoid the dependcy on nfsd in the filesystems that the
>>> cureent version introduces.

Yeah, I agree that this dependency can and should be taken care of.

>>
>> There is no "dependency on nfsd in the filesystems"
> 
> The patchset as pulled will created a depency on nfsd.ko from gfs2.ko
> 

yup, fs/nfsd/nfs4pnfsdlm.c exports pnfs_dlm_export_ops and implements
the respective methods.  This does not really belong to nfsd but rather
to the fs/dlm module.

Benny

>> The only dependency the FS has is an import of some library routines
>> at exportfs that take an abstract layout and device descriptions and encode
>> them into an XDR buffer. But the FS knows nothing of the XDR and the
>> NFSD is free to unload at any moment without forcing the FS to unload
>> first or at all.
>> This is actually tested, in fact I do this all the time when I want to
>> start fresh and have NFSD close all resources on the FS. 
> 
> That's not what happens with the file layout as posted, and it's not
> something I want to see happen every, btw.  In Linux we're all about
> proper abstractions, and letting a fs control all pnfs aspects directly
> instead of having common code is a receipe for tons of copy & pasted
> code full of different bugs if we ever get additional implementations of
> a layout.  Not that I really expect any in tree as all the other
> "interested partied" have shown to be leechers that just want to keep
> their filesystems out of tree and most of the time as illegal propritary
> modules anyway.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 2, 2013, 11:36 a.m. UTC | #15
On 2013-10-01 23:30, J. Bruce Fields wrote:
> On Sun, Sep 29, 2013 at 05:35:53AM -0700, Christoph Hellwig wrote:
>> On Sun, Sep 29, 2013 at 05:21:30AM -0700, Christoph Hellwig wrote:
>>>> Bruce - are you ok with moving the pnfs interface definitions to
>>>> include/linux/exportfs.h along with struct export_operations?
>>>>
>>>> In fact we can actually extend struct export_operations rather
>>>> than adding pnfs_export_operations...
>>>
>>> Yes, it probably should go into the export ops, although the actual
>>> method signatures might need to be made a litle less nfs-specific for
>>> that.
>>
>> I jsut took a brief look over the diff for the whole series in the git
>> tree and the old tree that still had block and exofs servers and have
>> revised my opinion a little bit:
>>
>>
>>  - the should be a layout_type field in struct export_operations,
>>    indicating that a filesystem support some sort of pnfs-like export.
>>  - there should be a struct pnfs_operations, but it should be confined
>>    to fs/nfsd: each layout can be a separate loadable module and gets
>>    registered there.  For the initial file layout that module is
>>    self-contained, but for e.g. block or objects it would have
>>    call into the filesystem through export_ops, although way lower level
>>    than the NFS XDR level, e.g. for block there would be one of to get
>>    the extent map, and one to allocate an extent.
> 
> That sounds OK to me.
> 
> My (possibly faulty) memory of how the xdr-encoding-in-the-fs came
> about: I think people weren't willing to commit to any reasonable upper
> limit on the size of the layout.  So they originally considered
> something more like readdir that would loop over extents and pass them
> to a callback for encoding.  But xdr isn't complicated so you could
> instead give the filesystem a simple library of xdr encoders to call.
> 
> But it sounds like that's not exactly what got implemented.  And maybe
> nobody actually has such big layouts.
> 
>> This way we alsod avoid the dependcy on nfsd in the filesystems that the
>> cureent version introduces.
> 
> Ugh--thanks for catching that.

I suggest moving the code in fs/nfsd/nfs4pnfsdlm.c to
fs/dlm.

Benny

> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 2, 2013, 4:06 p.m. UTC | #16
On Wed, Oct 02, 2013 at 02:35:42PM +0300, Benny Halevy wrote:
> For loosely clustered files-layout file systems the MDS would also
> need to handle LAYOUTCOMMIT, and also generate and handle layout recalls,
> depending on how much of that we can do in a generic way and how much is
> file system specific.

Which is something we can deal with in great detail once such a
filesystem and the pnfs support for it land in the kernel tree.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 2, 2013, 4:07 p.m. UTC | #17
On Wed, Oct 02, 2013 at 02:36:54PM +0300, Benny Halevy wrote:
> I suggest moving the code in fs/nfsd/nfs4pnfsdlm.c to
> fs/dlm.

I don't reall care where in the tree it lives, but it needs to be
neither part of nfsd or the filesystem (respectively dlm in this case).

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 3, 2013, 6:02 a.m. UTC | #18
On 2013-10-02 19:07, Christoph Hellwig wrote:
> On Wed, Oct 02, 2013 at 02:36:54PM +0300, Benny Halevy wrote:
>> I suggest moving the code in fs/nfsd/nfs4pnfsdlm.c to
>> fs/dlm.
> 
> I don't reall care where in the tree it lives, but it needs to be
> neither part of nfsd or the filesystem (respectively dlm in this case).

Just that this is dlm specific logic.
For example, using dlm_ino_hash() in nfsd4_pnfs_dlm_layoutget().
Or even knowing that
	layout->lg_stripe_type = STRIPE_SPARSE;
assumes knowledge of the underlying cluster fs implementation.

Benny

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 3, 2013, 9:55 a.m. UTC | #19
On Thu, Oct 03, 2013 at 09:02:27AM +0300, Benny Halevy wrote:
> Just that this is dlm specific logic.
> For example, using dlm_ino_hash() in nfsd4_pnfs_dlm_layoutget().
> Or even knowing that
> 	layout->lg_stripe_type = STRIPE_SPARSE;
> assumes knowledge of the underlying cluster fs implementation.

Which in-tree or soon in-tree filesystem do you care about?  And why
don't we see pnfs support for it submitted instead of the fairly useless
gfs2 support?

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 3, 2013, 12:29 p.m. UTC | #20
On 2013-10-03 12:55, Christoph Hellwig wrote:
> On Thu, Oct 03, 2013 at 09:02:27AM +0300, Benny Halevy wrote:
>> Just that this is dlm specific logic.
>> For example, using dlm_ino_hash() in nfsd4_pnfs_dlm_layoutget().
>> Or even knowing that
>> 	layout->lg_stripe_type = STRIPE_SPARSE;
>> assumes knowledge of the underlying cluster fs implementation.
> 
> Which in-tree or soon in-tree filesystem do you care about?  And why
> don't we see pnfs support for it submitted instead of the fairly useless
> gfs2 support?

I picked gfs2 as the initial use case for simplicity and ease of review.
If there is a rough consensus that it's useless and not worthy of inclusion
then the one we care about the most is exofs that has a more complete pnfs
implementation.

Benny

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 3, 2013, 12:37 p.m. UTC | #21
On Thu, Oct 03, 2013 at 03:29:06PM +0300, Benny Halevy wrote:
> I picked gfs2 as the initial use case for simplicity and ease of review.
> If there is a rough consensus that it's useless and not worthy of inclusion
> then the one we care about the most is exofs that has a more complete pnfs
> implementation.

This was in reference to file layout implementation details, so exofs
isn't a contender there.

As far as exofs is concerned a pnfs implementation based on it has just
as much toy status as the current gfs2 one.  While the pnfs side of it
might as well be a lot better, a filesystem that lacks all the integrity
and scalability features developed in the last 30 years can't be
considered more than a proof of concept.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ric Wheeler Oct. 3, 2013, 1:12 p.m. UTC | #22
On 10/03/2013 08:29 AM, Benny Halevy wrote:
> On 2013-10-03 12:55, Christoph Hellwig wrote:
>> >On Thu, Oct 03, 2013 at 09:02:27AM +0300, Benny Halevy wrote:
>>> >>Just that this is dlm specific logic.
>>> >>For example, using dlm_ino_hash() in nfsd4_pnfs_dlm_layoutget().
>>> >>Or even knowing that
>>> >>	layout->lg_stripe_type = STRIPE_SPARSE;
>>> >>assumes knowledge of the underlying cluster fs implementation.
>> >
>> >Which in-tree or soon in-tree filesystem do you care about?  And why
>> >don't we see pnfs support for it submitted instead of the fairly useless
>> >gfs2 support?
> I picked gfs2 as the initial use case for simplicity and ease of review.
> If there is a rough consensus that it's useless and not worthy of inclusion
> then the one we care about the most is exofs that has a more complete pnfs
> implementation.
>
> Benny
>

I don't see having GFS2 supported as a base for pNFS as useless. Christoph, is 
this a concern about GFS2 being too complicated for normal deployment or a lack 
in the pNFS support on top of it?

thanks!

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 3, 2013, 1:17 p.m. UTC | #23
On Thu, Oct 03, 2013 at 09:12:24AM -0400, Ric Wheeler wrote:
> >>>Which in-tree or soon in-tree filesystem do you care about?  And why
> >>>don't we see pnfs support for it submitted instead of the fairly useless
> >>>gfs2 support?
> >I picked gfs2 as the initial use case for simplicity and ease of review.
> >If there is a rough consensus that it's useless and not worthy of inclusion
> >then the one we care about the most is exofs that has a more complete pnfs
> >implementation.
> >
> >Benny
> >
> 
> I don't see having GFS2 supported as a base for pNFS as useless.
> Christoph, is this a concern about GFS2 being too complicated for
> normal deployment or a lack in the pNFS support on top of it?

Fairly useless was specific to the particular implementation:

 - which in the stipped down version here only supports DS access for
   reads
 - which in the previous version showed worse performance than always
   going through the MDS

I don't have a problem with using GFS2 by itself, but any implementation
proposed should actually show signifiant real life benefits before it
gets merged.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ric Wheeler Oct. 3, 2013, 1:18 p.m. UTC | #24
On 10/03/2013 09:17 AM, Christoph Hellwig wrote:
> On Thu, Oct 03, 2013 at 09:12:24AM -0400, Ric Wheeler wrote:
>>>>> Which in-tree or soon in-tree filesystem do you care about?  And why
>>>>> don't we see pnfs support for it submitted instead of the fairly useless
>>>>> gfs2 support?
>>> I picked gfs2 as the initial use case for simplicity and ease of review.
>>> If there is a rough consensus that it's useless and not worthy of inclusion
>>> then the one we care about the most is exofs that has a more complete pnfs
>>> implementation.
>>>
>>> Benny
>>>
>> I don't see having GFS2 supported as a base for pNFS as useless.
>> Christoph, is this a concern about GFS2 being too complicated for
>> normal deployment or a lack in the pNFS support on top of it?
> Fairly useless was specific to the particular implementation:
>
>   - which in the stipped down version here only supports DS access for
>     reads
>   - which in the previous version showed worse performance than always
>     going through the MDS
>
> I don't have a problem with using GFS2 by itself, but any implementation
> proposed should actually show signifiant real life benefits before it
> gets merged.
>

Makes sense, thanks!

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 3, 2013, 2:19 p.m. UTC | #25
On 2013-10-03 16:18, Ric Wheeler wrote:
> On 10/03/2013 09:17 AM, Christoph Hellwig wrote:
>> On Thu, Oct 03, 2013 at 09:12:24AM -0400, Ric Wheeler wrote:
>>>>>> Which in-tree or soon in-tree filesystem do you care about?  And why
>>>>>> don't we see pnfs support for it submitted instead of the fairly useless
>>>>>> gfs2 support?
>>>> I picked gfs2 as the initial use case for simplicity and ease of review.
>>>> If there is a rough consensus that it's useless and not worthy of inclusion
>>>> then the one we care about the most is exofs that has a more complete pnfs
>>>> implementation.
>>>>
>>>> Benny
>>>>
>>> I don't see having GFS2 supported as a base for pNFS as useless.
>>> Christoph, is this a concern about GFS2 being too complicated for
>>> normal deployment or a lack in the pNFS support on top of it?
>> Fairly useless was specific to the particular implementation:
>>
>>   - which in the stipped down version here only supports DS access for
>>     reads
>>   - which in the previous version showed worse performance than always
>>     going through the MDS
>>
>> I don't have a problem with using GFS2 by itself, but any implementation
>> proposed should actually show signifiant real life benefits before it
>> gets merged.
>>

The question is what is the minimum value for submitting upstream...

The thing pnfs over dlm/gfs2 is missing mostly is supporting read/write layout.
One could use them load balancing, e.g. by either redirecting to a node
holding an exclusive lock on the file, if there is one, or dlm_ino_hash in its absence.

Benny

> 
> Makes sense, thanks!
> 
> Ric
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Oct. 3, 2013, 2:21 p.m. UTC | #26
On Thu, Oct 03, 2013 at 05:19:31PM +0300, Benny Halevy wrote:
> The question is what is the minimum value for submitting upstream...
> 
> The thing pnfs over dlm/gfs2 is missing mostly is supporting read/write layout.
> One could use them load balancing, e.g. by either redirecting to a node
> holding an exclusive lock on the file, if there is one, or dlm_ino_hash in its absence.

One could do a lot of things, especially given infinite time.  But what
does the current code buy us?  We need data on that to justify merging
such a large chunk of code.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ric Wheeler Oct. 3, 2013, 2:24 p.m. UTC | #27
On 10/03/2013 10:21 AM, Christoph Hellwig wrote:
> On Thu, Oct 03, 2013 at 05:19:31PM +0300, Benny Halevy wrote:
>> The question is what is the minimum value for submitting upstream...
>>
>> The thing pnfs over dlm/gfs2 is missing mostly is supporting read/write layout.
>> One could use them load balancing, e.g. by either redirecting to a node
>> holding an exclusive lock on the file, if there is one, or dlm_ino_hash in its absence.
> One could do a lot of things, especially given infinite time.  But what
> does the current code buy us?  We need data on that to justify merging
> such a large chunk of code.
>

I think that we can help figure out how to get something reasonably useful ready 
for upstream, just need to get some people spun up to help test out some of 
these ideas...

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benny Halevy Oct. 3, 2013, 2:38 p.m. UTC | #28
On 2013-10-03 17:24, Ric Wheeler wrote:
> On 10/03/2013 10:21 AM, Christoph Hellwig wrote:
>> On Thu, Oct 03, 2013 at 05:19:31PM +0300, Benny Halevy wrote:
>>> The question is what is the minimum value for submitting upstream...
>>>
>>> The thing pnfs over dlm/gfs2 is missing mostly is supporting read/write layout.
>>> One could use them load balancing, e.g. by either redirecting to a node
>>> holding an exclusive lock on the file, if there is one, or dlm_ino_hash in its absence.
>> One could do a lot of things, especially given infinite time.  But what
>> does the current code buy us?  We need data on that to justify merging
>> such a large chunk of code.
>>
> 
> I think that we can help figure out how to get something reasonably useful ready 
> for upstream, just need to get some people spun up to help test out some of 
> these ideas...

That would be very helpful.  Much appreciated!

Benny

> 
> ric
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfsd/pnfsd.h b/fs/nfsd/pnfsd.h
new file mode 100644
index 0000000..65fb57e
--- /dev/null
+++ b/fs/nfsd/pnfsd.h
@@ -0,0 +1,37 @@ 
+/*
+ *  Copyright (c) 2005 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Andy Adamson <andros@umich.edu>
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the University nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef LINUX_NFSD_PNFSD_H
+#define LINUX_NFSD_PNFSD_H
+
+#endif /* LINUX_NFSD_PNFSD_H */
diff --git a/include/linux/nfsd/nfsd4_pnfs.h b/include/linux/nfsd/nfsd4_pnfs.h
new file mode 100644
index 0000000..9e7d95e
--- /dev/null
+++ b/include/linux/nfsd/nfsd4_pnfs.h
@@ -0,0 +1,37 @@ 
+/*
+ *  Copyright (c) 2006 The Regents of the University of Michigan.
+ *  All rights reserved.
+ *
+ *  Andy Adamson <andros@umich.edu>
+ *
+ *  Redistribution and use in source and binary forms, with or without
+ *  modification, are permitted provided that the following conditions
+ *  are met:
+ *
+ *  1. Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *  2. Redistributions in binary form must reproduce the above copyright
+ *     notice, this list of conditions and the following disclaimer in the
+ *     documentation and/or other materials provided with the distribution.
+ *  3. Neither the name of the University nor the names of its
+ *     contributors may be used to endorse or promote products derived
+ *     from this software without specific prior written permission.
+ *
+ *  THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED
+ *  WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ *  MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ *  DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ *  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ *  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ *  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ *  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+ *  LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+ *  NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ *  SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef _LINUX_NFSD_NFSD4_PNFS_H
+#define _LINUX_NFSD_NFSD4_PNFS_H
+
+#endif /* _LINUX_NFSD_NFSD4_PNFS_H */