Message ID | 20240808183340.483468-4-martin.oliveira@eideticom.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Enable P2PDMA in Userspace RDMA | expand |
On Thu, Aug 08, 2024 at 12:33:39PM -0600, Martin Oliveira wrote: > This check existed originally due to concerns that P2PDMA needed to copy > fsdax until pgmap refcounts were fixed (see [1]). > > The P2PDMA infrastructure will only call unmap_mapping_range() when the > underlying device is unbound, and immediately after unmapping it waits > for the reference of all ZONE_DEVICE pages to be released before > continuing. This does not allow for a page to be reused and no user > access fault is therefore possible. It does not have the same problem as > fsdax. > > The one minor concern with FOLL_LONGTERM pins is they will block device > unbind until userspace releases them all. This is unfortunately not really minor unless we have a well documented way to force this :(
On Sun, Aug 11, 2024 at 11:39:22PM -0700, Christoph Hellwig wrote: > On Thu, Aug 08, 2024 at 12:33:39PM -0600, Martin Oliveira wrote: > > This check existed originally due to concerns that P2PDMA needed to copy > > fsdax until pgmap refcounts were fixed (see [1]). > > > > The P2PDMA infrastructure will only call unmap_mapping_range() when the > > underlying device is unbound, and immediately after unmapping it waits > > for the reference of all ZONE_DEVICE pages to be released before > > continuing. This does not allow for a page to be reused and no user > > access fault is therefore possible. It does not have the same problem as > > fsdax. > > > > The one minor concern with FOLL_LONGTERM pins is they will block device > > unbind until userspace releases them all. > > This is unfortunately not really minor unless we have a well documented > way to force this :( It is not that different from blocking driver unbind while FDs are open which a lot of places do in various ways? Jason
On Mon, Aug 12, 2024 at 08:12:49PM -0300, Jason Gunthorpe wrote: > > This is unfortunately not really minor unless we have a well documented > > way to force this :( > > It is not that different from blocking driver unbind while FDs are > open which a lot of places do in various ways? Where do we block driver unbind with an open resource? The whole concept is that open resources will pin the in-memory object (and modulo for a modular driver), but never an unbind or hardware unplug, of which unbind really just is a simulation.
On Mon, Aug 12, 2024 at 10:41:20PM -0700, Christoph Hellwig wrote: > On Mon, Aug 12, 2024 at 08:12:49PM -0300, Jason Gunthorpe wrote: > > > This is unfortunately not really minor unless we have a well documented > > > way to force this :( > > > > It is not that different from blocking driver unbind while FDs are > > open which a lot of places do in various ways? > > Where do we block driver unbind with an open resource? I keep seeing it in different subsystems, safe driver unbind is really hard. :\ eg I think VFIO has some waits in it > The whole concept is that open resources will pin the in-memory > object (and modulo for a modular driver), but never an unbind or > hardware unplug, of which unbind really just is a simulation. Yes, ideally, but not every part of the kernel hits that ideal in my experience. It is alot of work and some places don't have any good solutions, like here. Jason
Jason Gunthorpe wrote: > On Mon, Aug 12, 2024 at 10:41:20PM -0700, Christoph Hellwig wrote: > > On Mon, Aug 12, 2024 at 08:12:49PM -0300, Jason Gunthorpe wrote: > > > > This is unfortunately not really minor unless we have a well documented > > > > way to force this :( > > > > > > It is not that different from blocking driver unbind while FDs are > > > open which a lot of places do in various ways? > > > > Where do we block driver unbind with an open resource? > > I keep seeing it in different subsystems, safe driver unbind is really > hard. :\ eg I think VFIO has some waits in it > > > The whole concept is that open resources will pin the in-memory > > object (and modulo for a modular driver), but never an unbind or > > hardware unplug, of which unbind really just is a simulation. > > Yes, ideally, but not every part of the kernel hits that ideal in my > experience. It is alot of work and some places don't have any good > solutions, like here. ...but there is a distinction between transient and permanent waits, right? The difficult aspect of FOLL_LONGTERM is the holder has no idea someone is trying to cleanup and may never drop its pin.
On Tue, Aug 13, 2024 at 10:03:55AM -0700, Dan Williams wrote: > Jason Gunthorpe wrote: > > On Mon, Aug 12, 2024 at 10:41:20PM -0700, Christoph Hellwig wrote: > > > On Mon, Aug 12, 2024 at 08:12:49PM -0300, Jason Gunthorpe wrote: > > > > > This is unfortunately not really minor unless we have a well documented > > > > > way to force this :( > > > > > > > > It is not that different from blocking driver unbind while FDs are > > > > open which a lot of places do in various ways? > > > > > > Where do we block driver unbind with an open resource? > > > > I keep seeing it in different subsystems, safe driver unbind is really > > hard. :\ eg I think VFIO has some waits in it > > > > > The whole concept is that open resources will pin the in-memory > > > object (and modulo for a modular driver), but never an unbind or > > > hardware unplug, of which unbind really just is a simulation. > > > > Yes, ideally, but not every part of the kernel hits that ideal in my > > experience. It is alot of work and some places don't have any good > > solutions, like here. > > ...but there is a distinction between transient and permanent waits, > right? The difficult aspect of FOLL_LONGTERM is the holder has no idea > someone is trying to cleanup and may never drop its pin. It is the quite similar to userspace holding a FD open while a driver is trying to unbind. The FD holder has possibly no idea things are waiting on it. Nice subsystems allow the FD to keep existing while the driver is unplugged, but still many have to wait for the FD to close as disconnecting an active driver from it's FD requires some pretty careful design. Jason
On Tue, Aug 13, 2024 at 01:05:02PM -0300, Jason Gunthorpe wrote: > > Where do we block driver unbind with an open resource? > > I keep seeing it in different subsystems, safe driver unbind is really > hard. :\ eg I think VFIO has some waits in it Well, waits for what? Usuaully an unbind has to wait for something, but waiting for unbound references is always a bug. And I can't think of any sensible subsystem doing this.
On Tue, Aug 13, 2024 at 09:26:55PM -0700, Christoph Hellwig wrote: > On Tue, Aug 13, 2024 at 01:05:02PM -0300, Jason Gunthorpe wrote: > > > Where do we block driver unbind with an open resource? > > > > I keep seeing it in different subsystems, safe driver unbind is really > > hard. :\ eg I think VFIO has some waits in it > > Well, waits for what? Looks like it waits for the fds to close at least. It has weird handshake where it notifies userspace that the device is closing and the userspace needs to close the fd. See vfio_unregister_group_dev() In the past VFIO couldn't do anything else because it had VMAs open that point to the device's mmio and it would be wrong to unbind the driver and leave those uncleaned. VFIO now knows how to zap VMA so maybe this could be improved, but it looks like a sizable uplift. > Usuaully an unbind has to wait for something, but waiting for > unbound references is always a bug. And I can't think of any > sensible subsystem doing this. I've seen several cases over the years.. It can be hard in many cases to deal with the issue. Like the VMA's above for instance. rdma also had to learn how to do zap before it could do fast unbind. Some places just have bugs where they don't wait and Weird Stuff happens. There is a strange misconception that module refcounts protect against unbind :\ Not saying this is good design, or desirable, just pointing out we seem to tolerate this in enough other places. In this case the only resolution I could think of would be to have the rdma stack somehow register a notifier on the pgmap. Then we could revoke the RDMA access synchronously during destruction and return those refcounts. That does seems a bit complicated though.. Jason
diff --git a/mm/gup.c b/mm/gup.c index 54d0dc3831fb..df267b7890aa 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2547,11 +2547,6 @@ static bool is_valid_gup_args(struct page **pages, int *locked, if (WARN_ON_ONCE((gup_flags & (FOLL_GET | FOLL_PIN)) && !pages)) return false; - /* We want to allow the pgmap to be hot-unplugged at all times */ - if (WARN_ON_ONCE((gup_flags & FOLL_LONGTERM) && - (gup_flags & FOLL_PCI_P2PDMA))) - return false; - *gup_flags_p = gup_flags; return true; }