Message ID | 20190809225833.6657-17-ira.weiny@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RDMA/FS DAX truncate proposal V1,000,002 ;-) | expand |
On Mon, Aug 12, 2019 at 10:00:40AM -0300, Jason Gunthorpe wrote: > On Fri, Aug 09, 2019 at 03:58:30PM -0700, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > In order for MRs to be tracked against the open verbs context the ufile > > needs to have a pointer to hand to the GUP code. > > > > No references need to be taken as this should be valid for the lifetime > > of the context. > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > drivers/infiniband/core/uverbs.h | 1 + > > drivers/infiniband/core/uverbs_main.c | 1 + > > 2 files changed, 2 insertions(+) > > > > diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h > > index 1e5aeb39f774..e802ba8c67d6 100644 > > +++ b/drivers/infiniband/core/uverbs.h > > @@ -163,6 +163,7 @@ struct ib_uverbs_file { > > struct page *disassociate_page; > > > > struct xarray idr; > > + struct file *sys_file; /* backpointer to system file object */ > > }; > > The 'struct file' has a lifetime strictly shorter than the > ib_uverbs_file, which is kref'd on its own lifetime. Having a back > pointer like this is confouding as it will be invalid for some of the > lifetime of the struct. Ah... ok. I really thought it was the other way around. __fput() should not call ib_uverbs_close() until the last reference on struct file is released... What holds references to struct ib_uverbs_file past that? Perhaps I need to add this (untested)? diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index f628f9e4c09f..654e774d9cf2 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -1125,6 +1125,8 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp) list_del_init(&file->list); mutex_unlock(&file->device->lists_mutex); + file->sys_file = NULL; + kref_put(&file->ref, ib_uverbs_release_file); return 0;
On Mon, Aug 12, 2019 at 02:56:15PM -0300, Jason Gunthorpe wrote: > On Mon, Aug 12, 2019 at 10:28:27AM -0700, Ira Weiny wrote: > > On Mon, Aug 12, 2019 at 10:00:40AM -0300, Jason Gunthorpe wrote: > > > On Fri, Aug 09, 2019 at 03:58:30PM -0700, ira.weiny@intel.com wrote: > > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > > > In order for MRs to be tracked against the open verbs context the ufile > > > > needs to have a pointer to hand to the GUP code. > > > > > > > > No references need to be taken as this should be valid for the lifetime > > > > of the context. > > > > > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > drivers/infiniband/core/uverbs.h | 1 + > > > > drivers/infiniband/core/uverbs_main.c | 1 + > > > > 2 files changed, 2 insertions(+) > > > > > > > > diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h > > > > index 1e5aeb39f774..e802ba8c67d6 100644 > > > > +++ b/drivers/infiniband/core/uverbs.h > > > > @@ -163,6 +163,7 @@ struct ib_uverbs_file { > > > > struct page *disassociate_page; > > > > > > > > struct xarray idr; > > > > + struct file *sys_file; /* backpointer to system file object */ > > > > }; > > > > > > The 'struct file' has a lifetime strictly shorter than the > > > ib_uverbs_file, which is kref'd on its own lifetime. Having a back > > > pointer like this is confouding as it will be invalid for some of the > > > lifetime of the struct. > > > > Ah... ok. I really thought it was the other way around. > > > > __fput() should not call ib_uverbs_close() until the last reference on struct > > file is released... What holds references to struct ib_uverbs_file past that? > > Child fds hold onto the internal ib_uverbs_file until they are closed The FDs hold the struct file, don't they? > > > Perhaps I need to add this (untested)? > > > > diff --git a/drivers/infiniband/core/uverbs_main.c > > b/drivers/infiniband/core/uverbs_main.c > > index f628f9e4c09f..654e774d9cf2 100644 > > +++ b/drivers/infiniband/core/uverbs_main.c > > @@ -1125,6 +1125,8 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp) > > list_del_init(&file->list); > > mutex_unlock(&file->device->lists_mutex); > > > > + file->sys_file = NULL; > > Now this has unlocked updates to that data.. you'd need some lock and > get not zero pattern You can't call "get" here because I'm 99% sure we only get here when struct file has no references left... I could be wrong. It took me a while to work through the reference counting so I could have missed something. Ira
On Tue, Aug 13, 2019 at 08:48:42AM -0300, Jason Gunthorpe wrote: > On Mon, Aug 12, 2019 at 02:15:37PM -0700, Ira Weiny wrote: > > On Mon, Aug 12, 2019 at 02:56:15PM -0300, Jason Gunthorpe wrote: > > > On Mon, Aug 12, 2019 at 10:28:27AM -0700, Ira Weiny wrote: > > > > On Mon, Aug 12, 2019 at 10:00:40AM -0300, Jason Gunthorpe wrote: > > > > > On Fri, Aug 09, 2019 at 03:58:30PM -0700, ira.weiny@intel.com wrote: > > > > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > > > > > > > In order for MRs to be tracked against the open verbs context the ufile > > > > > > needs to have a pointer to hand to the GUP code. > > > > > > > > > > > > No references need to be taken as this should be valid for the lifetime > > > > > > of the context. > > > > > > > > > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > > > drivers/infiniband/core/uverbs.h | 1 + > > > > > > drivers/infiniband/core/uverbs_main.c | 1 + > > > > > > 2 files changed, 2 insertions(+) > > > > > > > > > > > > diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h > > > > > > index 1e5aeb39f774..e802ba8c67d6 100644 > > > > > > +++ b/drivers/infiniband/core/uverbs.h > > > > > > @@ -163,6 +163,7 @@ struct ib_uverbs_file { > > > > > > struct page *disassociate_page; > > > > > > > > > > > > struct xarray idr; > > > > > > + struct file *sys_file; /* backpointer to system file object */ > > > > > > }; > > > > > > > > > > The 'struct file' has a lifetime strictly shorter than the > > > > > ib_uverbs_file, which is kref'd on its own lifetime. Having a back > > > > > pointer like this is confouding as it will be invalid for some of the > > > > > lifetime of the struct. > > > > > > > > Ah... ok. I really thought it was the other way around. > > > > > > > > __fput() should not call ib_uverbs_close() until the last reference on struct > > > > file is released... What holds references to struct ib_uverbs_file past that? > > > > > > Child fds hold onto the internal ib_uverbs_file until they are closed > > > > The FDs hold the struct file, don't they? > > Only dups, there are other 'child' FDs we can create > > > > Now this has unlocked updates to that data.. you'd need some lock and > > > get not zero pattern > > > > You can't call "get" here because I'm 99% sure we only get here when struct > > file has no references left... > > Nope, like I said the other FDs hold the uverbs_file independent of > the struct file it is related too. <sigh> We don't allow memory registrations to be created with those other FDs... And I was pretty sure uverbs_destroy_ufile_hw() would take care of (or ensure that some other thread is) destroying all the MR's we have associated with this FD. I'll have to think on this more since uverbs_destroy_ufile_hw() does not block... Which means there is a window here within the GUP code... :-/ > > This is why having a back pointer like this is so ugly, it creates a > reference counting cycle Yep... I worked through this... and it was giving me fits... Anyway, the struct file is the only object in the core which was reasonable to store this information in since that is what is passed around to other processes... Another idea I explored was to create a callback into the driver from the core which put the responsibility of printing the pin information on the driver. But that started to be (and is likely going to be) a pretty complicated "dance" between the core and the drivers so I went this way... I also thought about holding some other reference on struct file which would allow release to be called while keeping struct file around. But that seemed crazy... Ira
On Tue, Aug 13, 2019 at 03:00:22PM -0300, Jason Gunthorpe wrote: > On Tue, Aug 13, 2019 at 10:41:42AM -0700, Ira Weiny wrote: > > > And I was pretty sure uverbs_destroy_ufile_hw() would take care of (or ensure > > that some other thread is) destroying all the MR's we have associated with this > > FD. > > fd's can't be revoked, so destroy_ufile_hw() can't touch them. It > deletes any underlying HW resources, but the FD persists. I misspoke. I should have said associated with this "context". And of course uverbs_destroy_ufile_hw() does not touch the FD. What I mean is that the struct file which had file_pins hanging off of it would be getting its file pins destroyed by uverbs_destroy_ufile_hw(). Therefore we don't need the FD after uverbs_destroy_ufile_hw() is done. But since it does not block it may be that the struct file is gone before the MR is actually destroyed. Which means I think the GUP code would blow up in that case... :-( I was thinking it was the other way around. And in fact most of the time I think it is. But we can't depend on that... > > > > This is why having a back pointer like this is so ugly, it creates a > > > reference counting cycle > > > > Yep... I worked through this... and it was giving me fits... > > > > Anyway, the struct file is the only object in the core which was reasonable to > > store this information in since that is what is passed around to other > > processes... > > It could be passed down in the uattr_bundle, once you are in file operations What is "It"? The struct file? Or the file pin information? > handle the file is guarenteed to exist, and we've now arranged things I don't understand what you mean by "... once you are in file operations handle... "? > so the uattr_bundle flows into the umem, as umems can only be > established under a system call. "uattr_bundle" == uverbs_attr_bundle right? The problem is that I don't think the core should be handling uverbs_attr_bundles directly. So, I think you are driving at the same idea I had WRT callbacks into the driver. The drivers could provide some generic object (in RDMA this could be the uverbs_attr_bundle) which represents their "context". The GUP code calls back into the driver with file pin information as it encounters and pins pages. The driver, RDMA in this case, associates this information with the "context". But for the procfs interface, that context then needs to be associated with any file which points to it... For RDMA, or any other "FD based pin mechanism", it would be up to the driver to "install" a procfs handler into any struct file which _may_ point to this context. (before _or_ after memory pins). Then the procfs code can walk the FD array and if this handler is installed it knows there is file pin information associated with that struct file and it can be printed... This is not impossible. But I think is a lot harder for drivers to make right... Ira
On Wed, Aug 14, 2019 at 09:23:08AM -0300, Jason Gunthorpe wrote: > On Tue, Aug 13, 2019 at 01:38:59PM -0700, Ira Weiny wrote: > > On Tue, Aug 13, 2019 at 03:00:22PM -0300, Jason Gunthorpe wrote: > > > On Tue, Aug 13, 2019 at 10:41:42AM -0700, Ira Weiny wrote: > > > > > > > And I was pretty sure uverbs_destroy_ufile_hw() would take care of (or ensure > > > > that some other thread is) destroying all the MR's we have associated with this > > > > FD. > > > > > > fd's can't be revoked, so destroy_ufile_hw() can't touch them. It > > > deletes any underlying HW resources, but the FD persists. > > > > I misspoke. I should have said associated with this "context". And of course > > uverbs_destroy_ufile_hw() does not touch the FD. What I mean is that the > > struct file which had file_pins hanging off of it would be getting its file > > pins destroyed by uverbs_destroy_ufile_hw(). Therefore we don't need the FD > > after uverbs_destroy_ufile_hw() is done. > > > > But since it does not block it may be that the struct file is gone before the > > MR is actually destroyed. Which means I think the GUP code would blow up in > > that case... :-( > > Oh, yes, that is true, you also can't rely on the struct file living > longer than the HW objects either, that isn't how the lifetime model > works. > > If GUP consumes the struct file it must allow the struct file to be > deleted before the GUP pin is released. I may have to think about this a bit. But I'm starting to lean toward my callback method as a solution... > > > The drivers could provide some generic object (in RDMA this could be the > > uverbs_attr_bundle) which represents their "context". > > For RDMA the obvious context is the struct ib_mr * Not really, but maybe. See below regarding tracking this across processes. > > > But for the procfs interface, that context then needs to be associated with any > > file which points to it... For RDMA, or any other "FD based pin mechanism", it > > would be up to the driver to "install" a procfs handler into any struct file > > which _may_ point to this context. (before _or_ after memory pins). > > Is this all just for debugging? Seems like a lot of complication just > to print a string No, this is a requirement to allow an admin to determine why their truncates may be failing. As per our discussion here: https://lkml.org/lkml/2019/6/7/982 Looking back at the thread apparently no one confirmed my question (assertion). But no one objected to it either! :-D From that post: "... if we can keep track of who has the pins in lsof can we agree no process needs to be SIGKILL'ed? Admins can do this on their own "killing" if they really need to stop the use of these files, right?" This is what I am trying to do here is ensure that no matter what the user does. Fork, munmap, SCM_RIGHTS, close (on any FD), the underlying pin is associated to any process which has access to those pins and is holding references to those pages. Then any user of the system who gets a failing truncate can figure out which processes are holding this up. > > Generally, I think you'd be better to associate things with the > mm_struct not some struct file... The whole design is simpler as GUP > already has the mm_struct. I wish I _could_ do that... And for some simple users I do that. This is why rdma_pin has the option to track against mm_struct _OR_ struct file. At first it seemed like carrying over the mm_struct info during fork would work... but then there is SCM_RIGHTS where one can share the RDMA context with any "random" process... AFAICS struct file has no concept of mm_struct (nor should it) so the dup for SCM_RIGHTS processing would not be able to do this. A further complication was that when the RDMA FD is dup'ed the RDMA subsystem does not know about it... So it was not straight forward to have the RDMA subsystem do this either. Not to mention that would be yet another complication the drivers would have to deal with... I think you had similar issues which lead to the use of an "owning_mm" in the umem object. So while _some_ mm_struct is held it may not be visible to the user since that mm_struct may belong to a process which is gone... Or even if not gone, killing it would not fully remove the pin... So keeping this tracked against struct file works (and seemed straight forward) no matter where/how the RDMA FD is shared... Even with the complication above I still think it is easier to do this way. If I am missing something WRT the mm_struct "I'm all ears". Ira
On Wed, Aug 14, 2019 at 09:23:08AM -0300, Jason Gunthorpe wrote: > On Tue, Aug 13, 2019 at 01:38:59PM -0700, Ira Weiny wrote: > > On Tue, Aug 13, 2019 at 03:00:22PM -0300, Jason Gunthorpe wrote: > > > On Tue, Aug 13, 2019 at 10:41:42AM -0700, Ira Weiny wrote: > > > > > > > And I was pretty sure uverbs_destroy_ufile_hw() would take care of (or ensure > > > > that some other thread is) destroying all the MR's we have associated with this > > > > FD. > > > > > > fd's can't be revoked, so destroy_ufile_hw() can't touch them. It > > > deletes any underlying HW resources, but the FD persists. > > > > I misspoke. I should have said associated with this "context". And of course > > uverbs_destroy_ufile_hw() does not touch the FD. What I mean is that the > > struct file which had file_pins hanging off of it would be getting its file > > pins destroyed by uverbs_destroy_ufile_hw(). Therefore we don't need the FD > > after uverbs_destroy_ufile_hw() is done. > > > > But since it does not block it may be that the struct file is gone before the > > MR is actually destroyed. Which means I think the GUP code would blow up in > > that case... :-( > > Oh, yes, that is true, you also can't rely on the struct file living > longer than the HW objects either, that isn't how the lifetime model > works. Reviewing all these old threads... And this made me think. While the HW objects may out live the struct file. They _are_ going away in a finite amount of time right? It is not like they could be held forever right? Ira > > If GUP consumes the struct file it must allow the struct file to be > deleted before the GUP pin is released. > > > The drivers could provide some generic object (in RDMA this could be the > > uverbs_attr_bundle) which represents their "context". > > For RDMA the obvious context is the struct ib_mr * > > > But for the procfs interface, that context then needs to be associated with any > > file which points to it... For RDMA, or any other "FD based pin mechanism", it > > would be up to the driver to "install" a procfs handler into any struct file > > which _may_ point to this context. (before _or_ after memory pins). > > Is this all just for debugging? Seems like a lot of complication just > to print a string > > Generally, I think you'd be better to associate things with the > mm_struct not some struct file... The whole design is simpler as GUP > already has the mm_struct. > > Jason
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 1e5aeb39f774..e802ba8c67d6 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -163,6 +163,7 @@ struct ib_uverbs_file { struct page *disassociate_page; struct xarray idr; + struct file *sys_file; /* backpointer to system file object */ }; struct ib_uverbs_event { diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 11c13c1381cf..002c24e0d4db 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -1092,6 +1092,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp) INIT_LIST_HEAD(&file->umaps); filp->private_data = file; + file->sys_file = filp; list_add_tail(&file->list, &dev->uverbs_file_list); mutex_unlock(&dev->lists_mutex); srcu_read_unlock(&dev->disassociate_srcu, srcu_key);