diff mbox series

[RFC] component: do not leave master devres group open after bind

Message ID 20210921111810.2766726-1-kai.vehmanen@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC] component: do not leave master devres group open after bind | expand

Commit Message

Kai Vehmanen Sept. 21, 2021, 11:18 a.m. UTC
In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.

This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:

 i915_pci_remove()
   -> intel_display_driver_unregister()
     -> i915_audio_component_cleanup()
       -> component_del()
         -> component.c:take_down_master()
           -> hdac_component_master_unbind() [via master->ops->unbind()]
           -> devres_release_group(master->parent, NULL)

With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:

component_master_add_with_match()
[  126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[  126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[  126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)

audio driver completes its PCI probe()
[  126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)

component_del() called() at DRM/i915 unbind()
[  137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[  137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[  137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)

So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.

Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after the master->ops->bind()
call is done. This allows devres allocations done by the driver acting as
master to be isolated from the binding state of the aggregate driver. This
modifies the logic originally introduced in commit 9e1ccb4a7700
("drivers/base: fix devres handling for master device").

BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
---
 drivers/base/component.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Hi,
I'm sending this as RFC as I'm not sure of the implications of
not leaving the devres group open might have to other users
of the component framework.

For audio, the current behaviour seems very problematic. The display
codec is usually just one of many audio codecs attached to the controller,
and unbind of the display codec (and the aggregate driver created with
DRM), should not bring down the whole audio card.

However, now all allocations audio driver does after call to
component_master_add_with_match(), will be freed when display
driver calls component_del().

Closing the devres group at end of component_master_add_*() would
seem the cleanest option. Looking for feedback whether this approach
is feasible. One alternative would be for the audio driver to
close the "last opened" group after its call to component_master_add(),
but this seems messy (audio would make assumptions on component.c
internals).


base-commit: 930e99a51fcc8b1254e0a45fbe0cd5a5b8a704a5

Comments

Imre Deak Sept. 21, 2021, 5:43 p.m. UTC | #1
On Tue, Sep 21, 2021 at 02:18:10PM +0300, Kai Vehmanen wrote:
> In current code, the devres group for aggregate master is left open
> after call to component_master_add_*(). This leads to problems when the
> master does further managed allocations on its own. When any
> participating driver calls component_del(), this leads to immediate
> release of resources.
> 
> This came up when investigating a page fault occurring with i915 DRM
> driver unbind with 5.15-rc1 kernel. The following sequence occurs:
> 
>  i915_pci_remove()
>    -> intel_display_driver_unregister()
>      -> i915_audio_component_cleanup()
>        -> component_del()
>          -> component.c:take_down_master()
>            -> hdac_component_master_unbind() [via master->ops->unbind()]
>            -> devres_release_group(master->parent, NULL)
> 
> With older kernels this has not caused issues, but with audio driver
> moving to use managed interfaces for more of its allocations, this no
> longer works. Devres log shows following to occur:
> 
> component_master_add_with_match()
> [  126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
> [  126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
> [  126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)
> 
> audio driver completes its PCI probe()
> [  126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)
> 
> component_del() called() at DRM/i915 unbind()
> [  137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
> [  137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
> [  137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)
> 
> So the "devres_release_group(master->parent, NULL)" ends up freeing the
> pcim_iomap allocation. Upon next runtime resume, the audio driver will
> cause a page fault as the iomap alloc was released without the driver
> knowing about it.
> 
> Fix this issue by using the "struct master" pointer as identifier for
> the devres group, and by closing the devres group after the master->ops->bind()
> call is done. This allows devres allocations done by the driver acting as
> master to be isolated from the binding state of the aggregate driver. This
> modifies the logic originally introduced in commit 9e1ccb4a7700
> ("drivers/base: fix devres handling for master device").
> 
> BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
> Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>

This makes sense to me and also seems to match the devres logic for
component binding (component_bind()), where the only resources freed at
unbind (or at bind failure) are those that were allocated in the
components' bind hook. Any resource allocated later by the components
are not affected by the unbinding:

Acked-by: Imre Deak <imre.deak@intel.com>

If for some reason the current behaviour was intentional, one
alternative I can think of would be for audio to register itself with the
component framework using a dedicated kdev object.

Imo the component framework's devres semantics would need to be
documented as well.

> ---
>  drivers/base/component.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Hi,
> I'm sending this as RFC as I'm not sure of the implications of
> not leaving the devres group open might have to other users
> of the component framework.
> 
> For audio, the current behaviour seems very problematic. The display
> codec is usually just one of many audio codecs attached to the controller,
> and unbind of the display codec (and the aggregate driver created with
> DRM), should not bring down the whole audio card.
> 
> However, now all allocations audio driver does after call to
> component_master_add_with_match(), will be freed when display
> driver calls component_del().
> 
> Closing the devres group at end of component_master_add_*() would
> seem the cleanest option. Looking for feedback whether this approach
> is feasible. One alternative would be for the audio driver to
> close the "last opened" group after its call to component_master_add(),
> but this seems messy (audio would make assumptions on component.c
> internals).
> 
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index 5e79299f6c3f..870485cbbb87 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -246,7 +246,7 @@ static int try_to_bring_up_master(struct master *master,
>  		return 0;
>  	}
>  
> -	if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
> +	if (!devres_open_group(master->parent, master, GFP_KERNEL))
>  		return -ENOMEM;
>  
>  	/* Found all components */
> @@ -258,6 +258,7 @@ static int try_to_bring_up_master(struct master *master,
>  		return ret;
>  	}
>  
> +	devres_close_group(master->parent, NULL);
>  	master->bound = true;
>  	return 1;
>  }
> @@ -282,7 +283,7 @@ static void take_down_master(struct master *master)
>  {
>  	if (master->bound) {
>  		master->ops->unbind(master->parent);
> -		devres_release_group(master->parent, NULL);
> +		devres_release_group(master->parent, master);
>  		master->bound = false;
>  	}
>  }
> 
> base-commit: 930e99a51fcc8b1254e0a45fbe0cd5a5b8a704a5
> -- 
> 2.32.0
>
Russell King (Oracle) Sept. 21, 2021, 9:49 p.m. UTC | #2
On Tue, Sep 21, 2021 at 02:18:10PM +0300, Kai Vehmanen wrote:
> In current code, the devres group for aggregate master is left open
> after call to component_master_add_*(). This leads to problems when the
> master does further managed allocations on its own. When any
> participating driver calls component_del(), this leads to immediate
> release of resources.
> 
> This came up when investigating a page fault occurring with i915 DRM
> driver unbind with 5.15-rc1 kernel. The following sequence occurs:
> 
>  i915_pci_remove()
>    -> intel_display_driver_unregister()
>      -> i915_audio_component_cleanup()
>        -> component_del()
>          -> component.c:take_down_master()
>            -> hdac_component_master_unbind() [via master->ops->unbind()]
>            -> devres_release_group(master->parent, NULL)
> 
> With older kernels this has not caused issues, but with audio driver
> moving to use managed interfaces for more of its allocations, this no
> longer works. Devres log shows following to occur:
> 
> component_master_add_with_match()
> [  126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
> [  126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
> [  126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)
> 
> audio driver completes its PCI probe()
> [  126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)
> 
> component_del() called() at DRM/i915 unbind()
> [  137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
> [  137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
> [  137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)
> 
> So the "devres_release_group(master->parent, NULL)" ends up freeing the
> pcim_iomap allocation. Upon next runtime resume, the audio driver will
> cause a page fault as the iomap alloc was released without the driver
> knowing about it.
> 
> Fix this issue by using the "struct master" pointer as identifier for
> the devres group, and by closing the devres group after the master->ops->bind()
> call is done. This allows devres allocations done by the driver acting as
> master to be isolated from the binding state of the aggregate driver. This
> modifies the logic originally introduced in commit 9e1ccb4a7700
> ("drivers/base: fix devres handling for master device").

Yes, this is the right fix - I did not expect people to be claiming
resources after adding the component master, since that is the point
at which the master can become active. Hence all resources that may
be used should either already be claimed, or (preferred) be claimed
when the master gets the bind call.

However, I think the i915 bits are more complex than that simple view,
so putting the component stuff inside its own devres group and closing
it at the end of try_to_bring_up_master() makes sense.

Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks.
diff mbox series

Patch

diff --git a/drivers/base/component.c b/drivers/base/component.c
index 5e79299f6c3f..870485cbbb87 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -246,7 +246,7 @@  static int try_to_bring_up_master(struct master *master,
 		return 0;
 	}
 
-	if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+	if (!devres_open_group(master->parent, master, GFP_KERNEL))
 		return -ENOMEM;
 
 	/* Found all components */
@@ -258,6 +258,7 @@  static int try_to_bring_up_master(struct master *master,
 		return ret;
 	}
 
+	devres_close_group(master->parent, NULL);
 	master->bound = true;
 	return 1;
 }
@@ -282,7 +283,7 @@  static void take_down_master(struct master *master)
 {
 	if (master->bound) {
 		master->ops->unbind(master->parent);
-		devres_release_group(master->parent, NULL);
+		devres_release_group(master->parent, master);
 		master->bound = false;
 	}
 }