Message ID | 20210216181925.650082-1-tomas.winkler@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | drm/i915/spi: discrete graphics internal spi | expand |
)On Tue, Feb 16, 2021 at 7:26 PM Tomas Winkler <tomas.winkler@intel.com> wrote: > Because the graphic card may undergo reset at any time and basically hot > unplug all its child devices, this series also provides a fix to the mtd > framework to make the reset graceful. Well, just because MTD does not work as you expect, it is not broken. :-) In your case i915_spi_remove() blindly removes the MTD, this is not allowed. You may remove the MTD only if there are no more users. The current model in MTD is that the driver is in charge of all life cycle management. Using ->_get_device() and ->_put_device() a driver can implement refcounting and deny new users if the MTD is about to disappear. In the upcoming MUSE driver that mechanism is used too. MUSE allows to implement a MTD in userspace. So the FUSE server can disappear at *any* time. Just like in your case. Even worse, it can be hostile. In MUSE the MTD life time is tied to the FUSE connection object, muse_mtd_get_device() increments the FUSE connection refcount, and muse_mtd_put_device() decrements it. That means if the FUSE server disappears all of a sudden but the MTD still has users, the MTD will stay. But in this state no new references are allowed and all MTD operations of existing users will fail with -ENOTCONN (via FUSE). As soon the last user is gone (can be userspace via /dev/mtd* or a in-kernel user such as UBIFS), the MTD will be removed. For the full details, please see: https://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git/tree/fs/fuse/muse.c?h=muse_v3#n1034 Is in your case *really* not possible to do it that way? On the other hand, your last patch moves some part of the life cycle management into MTD core. The MTD will stay as long it has users. But that's only one part. The driver is still in charge to make sure that all operations fail immediately and that no new users arrive. If we want to do all in MTD core we'd have to do it like SCSI disks. That means having devices states such as SDEV_RUNNING, SDEV_CANCEL, SDEV_OFFLINE, .... That way the MTD could be shutdown gracefully, first no new users are allowed, then ongoing operations will be cancelled, next all operation will fail with -EIO or such, then the device is being removed from sysfs and finally if the last user is gone, the MTD can be removed. I'm not sure whether we want to take that path.
> > )On Tue, Feb 16, 2021 at 7:26 PM Tomas Winkler <tomas.winkler@intel.com> > wrote: > > Because the graphic card may undergo reset at any time and basically > > hot unplug all its child devices, this series also provides a fix to > > the mtd framework to make the reset graceful. > > Well, just because MTD does not work as you expect, it is not broken. :-) I'm not saying it's broken by design it just didn't fit this use case. > > In your case i915_spi_remove() blindly removes the MTD, this is not allowed. > You may remove the MTD only if there are no more users. I'm not sure it's good idea to stall the removal on user space. This is just asking for a deadlock as user space is not getting what it needs and may stall I think it's better the user space will fail gracefully the hw is not accessible in that stage anyway. > > The current model in MTD is that the driver is in charge of all life cycle > management. > Using ->_get_device() and ->_put_device() a driver can implement > refcounting and deny new users if the MTD is about to disappear. Please note that this use case you are describing is still valid, I haven't removed _get_device() _put_device() handlers, You can still stall the removal of mtd, If this is not that way it's a bug > > In the upcoming MUSE driver that mechanism is used too. > MUSE allows to implement a MTD in userspace. So the FUSE server can > disappear at > *any* time. Just like in your case. Even worse, it can be hostile. > In MUSE the MTD life time is tied to the FUSE connection object, > muse_mtd_get_device() > increments the FUSE connection refcount, and muse_mtd_put_device() > decrements it. > That means if the FUSE server disappears all of a sudden but the MTD still has > users, the MTD will stay. But in this state no new references are allowed and > all MTD operations of existing users will fail with -ENOTCONN (via FUSE). > As soon the last user is gone (can be userspace via /dev/mtd* or a in-kernel > user such as UBIFS), the MTD will be removed. But in our case whole i915 is taken hostage, it cannot reset because of misbehaving user space. > For the full details, please see: > https://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git/tree/fs/fuse/m > use.c?h=muse_v3#n1034 > > Is in your case *really* not possible to do it that way? Maybe it's possible but I don't think it's good to stall i915 removal. Also It's very easily to crash the kernel. I've posted a sniped to the mailing list that tried to do that, the kernel still has crashed. Can you looked at? > On the other hand, your last patch moves some part of the life cycle > management into MTD core. > The MTD will stay as long it has users. > But that's only one part. The driver is still in charge to make sure that all > operations fail immediately and that no new users arrive. I think that case I would need to validate every HW access to make sure it's still valid. > If we want to do all in MTD core we'd have to do it like SCSI disks. > That means having devices states such as SDEV_RUNNING, SDEV_CANCEL, > SDEV_OFFLINE, .... > That way the MTD could be shutdown gracefully, first no new users are > allowed, then ongoing operations will be cancelled, next all operation will fail > with -EIO or such, then the device is being removed from sysfs and finally if > the last user is gone, the MTD can be removed. Isn't that already that way? You cannot open new handler. That I would need more of your insights. > > I'm not sure whether we want to take that path. Thanks Tomas
On Tue, 16 Feb 2021, Tomas Winkler <tomas.winkler@intel.com> wrote: > Intel discrete graphic devices have internal spi storage, that holds > firmware and oprom images. The spi device is exposed to the user space > via mtd framework to be accessed during manufacturing. > The device is hardware locked after manufacturing and only read access > is provided. > > The i915 plays role of a multi function device (mfd) and spi device > is exposed as its child device. i915_spi platform driver binds to > this device. What's the plan wrt i915/spi maintainership? BR, Jani. > > Because the graphic card may undergo reset at any time and basically hot > unplug all its child devices, this series also provides a fix to the mtd > framework to make the reset graceful. > > Tomas Winkler (9): > drm/i915/spi: add spi device for discrete graphics > drm/i915/spi: intel_spi_region map > drm/i915/spi: add driver for on-die spi device > drm/i915/spi: implement region enumeration > drm/i915/spi: implement spi access functions > drm/i915/spi: spi register with mtd > drm/i915/spi: mtd: implement access handlers > drm/i915/spi: serialize spi access > mtd: use refcount to prevent corruption > > drivers/gpu/drm/i915/Kconfig | 3 + > drivers/gpu/drm/i915/Makefile | 6 + > drivers/gpu/drm/i915/i915_drv.c | 9 + > drivers/gpu/drm/i915/i915_drv.h | 4 + > drivers/gpu/drm/i915/i915_reg.h | 1 + > drivers/gpu/drm/i915/spi/intel_spi.c | 62 +++ > drivers/gpu/drm/i915/spi/intel_spi.h | 24 + > drivers/gpu/drm/i915/spi/intel_spi_drv.c | 675 +++++++++++++++++++++++ > drivers/mtd/mtdcore.c | 63 ++- > drivers/mtd/mtdcore.h | 1 + > drivers/mtd/mtdpart.c | 13 +- > include/linux/mtd/mtd.h | 2 +- > 12 files changed, 831 insertions(+), 32 deletions(-) > create mode 100644 drivers/gpu/drm/i915/spi/intel_spi.c > create mode 100644 drivers/gpu/drm/i915/spi/intel_spi.h > create mode 100644 drivers/gpu/drm/i915/spi/intel_spi_drv.c
On Tue, 16 Feb 2021, Tomas Winkler <tomas.winkler@intel.com> wrote: > Intel discrete graphic devices have internal spi storage, that holds > firmware and oprom images. The spi device is exposed to the user space > via mtd framework to be accessed during manufacturing. > The device is hardware locked after manufacturing and only read access > is provided. > > The i915 plays role of a multi function device (mfd) and spi device > is exposed as its child device. i915_spi platform driver binds to > this device. > > Because the graphic card may undergo reset at any time and basically hot > unplug all its child devices, this series also provides a fix to the mtd > framework to make the reset graceful. > > Tomas Winkler (9): > drm/i915/spi: add spi device for discrete graphics > drm/i915/spi: intel_spi_region map > drm/i915/spi: add driver for on-die spi device > drm/i915/spi: implement region enumeration > drm/i915/spi: implement spi access functions > drm/i915/spi: spi register with mtd > drm/i915/spi: mtd: implement access handlers > drm/i915/spi: serialize spi access > mtd: use refcount to prevent corruption > > drivers/gpu/drm/i915/Kconfig | 3 + > drivers/gpu/drm/i915/Makefile | 6 + > drivers/gpu/drm/i915/i915_drv.c | 9 + > drivers/gpu/drm/i915/i915_drv.h | 4 + > drivers/gpu/drm/i915/i915_reg.h | 1 + > drivers/gpu/drm/i915/spi/intel_spi.c | 62 +++ > drivers/gpu/drm/i915/spi/intel_spi.h | 24 + I'm open to discussion, but after glancing through the series I've got a gut feeling spi/ subdir should be purely about the separate module, and the above two files should be in i915/ directory instead. As it is, I think it's a bit confusing that spi/ is both about the spi kernel module and a singly .c file that's really part of i915.ko. Perhaps that messes up the conventional descending to subdirs in the kernel build too? BR, Jani. > drivers/gpu/drm/i915/spi/intel_spi_drv.c | 675 +++++++++++++++++++++++ > drivers/mtd/mtdcore.c | 63 ++- > drivers/mtd/mtdcore.h | 1 + > drivers/mtd/mtdpart.c | 13 +- > include/linux/mtd/mtd.h | 2 +- > 12 files changed, 831 insertions(+), 32 deletions(-) > create mode 100644 drivers/gpu/drm/i915/spi/intel_spi.c > create mode 100644 drivers/gpu/drm/i915/spi/intel_spi.h > create mode 100644 drivers/gpu/drm/i915/spi/intel_spi_drv.c
> > On Tue, 16 Feb 2021, Tomas Winkler <tomas.winkler@intel.com> wrote: > > Intel discrete graphic devices have internal spi storage, that holds > > firmware and oprom images. The spi device is exposed to the user space > > via mtd framework to be accessed during manufacturing. > > The device is hardware locked after manufacturing and only read access > > is provided. > > > > The i915 plays role of a multi function device (mfd) and spi device is > > exposed as its child device. i915_spi platform driver binds to this > > device. > > What's the plan wrt i915/spi maintainership? My suggestions is that this will be maintained by myself, as the major consumer is the manufacturing line. It will be a separate section in MAINTAINERS file. Thanks Tomas
On Wed, 17 Feb 2021, "Winkler, Tomas" <tomas.winkler@intel.com> wrote: >> >> On Tue, 16 Feb 2021, Tomas Winkler <tomas.winkler@intel.com> wrote: >> > Intel discrete graphic devices have internal spi storage, that holds >> > firmware and oprom images. The spi device is exposed to the user space >> > via mtd framework to be accessed during manufacturing. >> > The device is hardware locked after manufacturing and only read access >> > is provided. >> > >> > The i915 plays role of a multi function device (mfd) and spi device is >> > exposed as its child device. i915_spi platform driver binds to this >> > device. >> >> What's the plan wrt i915/spi maintainership? > > My suggestions is that this will be maintained by myself, as the major > consumer is the manufacturing line. It will be a separate section in > MAINTAINERS file. Works for me. Do you want to apply the patches directly to drm-intel, or your own branch and send pull requests to i915 maintainers? Can also start with the former, and move to the latter as needed. Joonas, Rodrigo, thoughts? BR, Jani.
> > On Tue, 16 Feb 2021, Tomas Winkler <tomas.winkler@intel.com> wrote: > > Intel discrete graphic devices have internal spi storage, that holds > > firmware and oprom images. The spi device is exposed to the user space > > via mtd framework to be accessed during manufacturing. > > The device is hardware locked after manufacturing and only read access > > is provided. > > > > The i915 plays role of a multi function device (mfd) and spi device is > > exposed as its child device. i915_spi platform driver binds to this > > device. > > > > Because the graphic card may undergo reset at any time and basically > > hot unplug all its child devices, this series also provides a fix to > > the mtd framework to make the reset graceful. > > > > Tomas Winkler (9): > > drm/i915/spi: add spi device for discrete graphics > > drm/i915/spi: intel_spi_region map > > drm/i915/spi: add driver for on-die spi device > > drm/i915/spi: implement region enumeration > > drm/i915/spi: implement spi access functions > > drm/i915/spi: spi register with mtd > > drm/i915/spi: mtd: implement access handlers > > drm/i915/spi: serialize spi access > > mtd: use refcount to prevent corruption > > > > drivers/gpu/drm/i915/Kconfig | 3 + > > drivers/gpu/drm/i915/Makefile | 6 + > > drivers/gpu/drm/i915/i915_drv.c | 9 + > > drivers/gpu/drm/i915/i915_drv.h | 4 + > > drivers/gpu/drm/i915/i915_reg.h | 1 + > > drivers/gpu/drm/i915/spi/intel_spi.c | 62 +++ > > drivers/gpu/drm/i915/spi/intel_spi.h | 24 + > > I'm open to discussion, but after glancing through the series I've got a gut > feeling spi/ subdir should be purely about the separate module, and the > above two files should be in i915/ directory instead. Maybe, I don't have strong feelings about that, it is just a decision from which point you want to look at that. > > As it is, I think it's a bit confusing that spi/ is both about the spi kernel module > and a singly .c file that's really part of i915.ko. Perhaps that messes up the > conventional descending to subdirs in the kernel build too? The intention was to make this capsulated from the file system point of view. In general the spi driver could be somewhere in mtd directory, but it doesn't really fit exactly there either. I don't have a strong opinion about that, if you do I yield. Thanks Tomas
On Wed, Feb 17, 2021 at 03:35:19PM +0200, Jani Nikula wrote: > On Wed, 17 Feb 2021, "Winkler, Tomas" <tomas.winkler@intel.com> wrote: > >> > >> On Tue, 16 Feb 2021, Tomas Winkler <tomas.winkler@intel.com> wrote: > >> > Intel discrete graphic devices have internal spi storage, that holds > >> > firmware and oprom images. The spi device is exposed to the user space > >> > via mtd framework to be accessed during manufacturing. > >> > The device is hardware locked after manufacturing and only read access > >> > is provided. > >> > > >> > The i915 plays role of a multi function device (mfd) and spi device is > >> > exposed as its child device. i915_spi platform driver binds to this > >> > device. > >> > >> What's the plan wrt i915/spi maintainership? > > > > My suggestions is that this will be maintained by myself, as the major > > consumer is the manufacturing line. It will be a separate section in > > MAINTAINERS file. > > Works for me. Do you want to apply the patches directly to drm-intel, or > your own branch and send pull requests to i915 maintainers? Can also > start with the former, and move to the latter as needed. > > Joonas, Rodrigo, thoughts? No strong opinion here. But I believe the pull request flow makes sense. > > BR, > Jani. > > -- > Jani Nikula, Intel Open Source Graphics Center > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > > > > > )On Tue, Feb 16, 2021 at 7:26 PM Tomas Winkler > > <tomas.winkler@intel.com> > > wrote: > > > Because the graphic card may undergo reset at any time and basically > > > hot unplug all its child devices, this series also provides a fix to > > > the mtd framework to make the reset graceful. > > > > Well, just because MTD does not work as you expect, it is not broken. > > :-) > I'm not saying it's broken by design it just didn't fit this use case. > > > > In your case i915_spi_remove() blindly removes the MTD, this is not > allowed. > > You may remove the MTD only if there are no more users. > > I'm not sure it's good idea to stall the removal on user space. > This is just asking for a deadlock as user space is not getting what it needs and > may stall I think it's better the user space will fail gracefully the hw is not > accessible in that stage anyway. > > > > The current model in MTD is that the driver is in charge of all life > > cycle management. > > Using ->_get_device() and ->_put_device() a driver can implement > > refcounting and deny new users if the MTD is about to disappear. > > Please note that this use case you are describing is still valid, I haven't > removed _get_device() _put_device() handlers, You can still stall the > removal of mtd, If this is not that way it's a bug > > > > > In the upcoming MUSE driver that mechanism is used too. > > MUSE allows to implement a MTD in userspace. So the FUSE server can > > disappear at > > *any* time. Just like in your case. Even worse, it can be hostile. > > In MUSE the MTD life time is tied to the FUSE connection object, > > muse_mtd_get_device() > > increments the FUSE connection refcount, and muse_mtd_put_device() > > decrements it. > > That means if the FUSE server disappears all of a sudden but the MTD > > still has users, the MTD will stay. But in this state no new > > references are allowed and all MTD operations of existing users will fail > with -ENOTCONN (via FUSE). > > As soon the last user is gone (can be userspace via /dev/mtd* or a > > in-kernel user such as UBIFS), the MTD will be removed. > > But in our case whole i915 is taken hostage, it cannot reset because of > misbehaving user space. > > > For the full details, please see: > > https://git.kernel.org/pub/scm/linux/kernel/git/rw/misc.git/tree/fs/fu > > se/m > > use.c?h=muse_v3#n1034 > > > > Is in your case *really* not possible to do it that way? > > Maybe it's possible but I don't think it's good to stall i915 removal. Also It's > very easily to crash the kernel. > I've posted a sniped to the mailing list that tried to do that, the kernel still has > crashed. Can you looked at? > > > On the other hand, your last patch moves some part of the life cycle > > management into MTD core. > > The MTD will stay as long it has users. > > But that's only one part. The driver is still in charge to make sure > > that all operations fail immediately and that no new users arrive. > > I think that case I would need to validate every HW access to make sure it's > still valid. > > > If we want to do all in MTD core we'd have to do it like SCSI disks. > > That means having devices states such as SDEV_RUNNING, SDEV_CANCEL, > > SDEV_OFFLINE, .... > > That way the MTD could be shutdown gracefully, first no new users are > > allowed, then ongoing operations will be cancelled, next all operation > > will fail with -EIO or such, then the device is being removed from > > sysfs and finally if the last user is gone, the MTD can be removed. > > Isn't that already that way? You cannot open new handler. That I would need > more of your insights. > > > > I'm not sure whether we want to take that path. Hi Richard is there any way we can try to unclutter this ? Thanks Tomas
----- Ursprüngliche Mail ----- >> > I'm not sure whether we want to take that path. > > Hi Richard is there any way we can try to unclutter this ? Of course there is a way. :-) Your use-case really seems to be special and MTD needs an improvement. Miquel, Vignesh and I just need to check more internals and corner cases in MTD. With some luck your patch can be used as-is with some minor adjustments on top. Thanks, //richard
> ----- Ursprüngliche Mail ----- > >> > I'm not sure whether we want to take that path. > > > > Hi Richard is there any way we can try to unclutter this ? > > Of course there is a way. :-) > Your use-case really seems to be special and MTD needs an improvement. > Miquel, Vignesh and I just need to check more internals and corner cases in > MTD. > With some luck your patch can be used as-is with some minor adjustments > on top. Great, waiting for your comments. Thanks Tomas
> > > > ----- Ursprüngliche Mail ----- > > >> > I'm not sure whether we want to take that path. > > > > > > Hi Richard is there any way we can try to unclutter this ? > > > > Of course there is a way. :-) > > Your use-case really seems to be special and MTD needs an improvement. > > Miquel, Vignesh and I just need to check more internals and corner > > cases in MTD. > > With some luck your patch can be used as-is with some minor > > adjustments on top. > Great, waiting for your comments. Hi, I would like to send out the second version of the series. Please if you have any comments let me know. Thanks Tomas
On Wed, 17 Feb 2021, "Winkler, Tomas" <tomas.winkler@intel.com> wrote: >> >> On Tue, 16 Feb 2021, Tomas Winkler <tomas.winkler@intel.com> wrote: >> > Intel discrete graphic devices have internal spi storage, that holds >> > firmware and oprom images. The spi device is exposed to the user space >> > via mtd framework to be accessed during manufacturing. >> > The device is hardware locked after manufacturing and only read access >> > is provided. >> > >> > The i915 plays role of a multi function device (mfd) and spi device is >> > exposed as its child device. i915_spi platform driver binds to this >> > device. >> > >> > Because the graphic card may undergo reset at any time and basically >> > hot unplug all its child devices, this series also provides a fix to >> > the mtd framework to make the reset graceful. >> > >> > Tomas Winkler (9): >> > drm/i915/spi: add spi device for discrete graphics >> > drm/i915/spi: intel_spi_region map >> > drm/i915/spi: add driver for on-die spi device >> > drm/i915/spi: implement region enumeration >> > drm/i915/spi: implement spi access functions >> > drm/i915/spi: spi register with mtd >> > drm/i915/spi: mtd: implement access handlers >> > drm/i915/spi: serialize spi access >> > mtd: use refcount to prevent corruption >> > >> > drivers/gpu/drm/i915/Kconfig | 3 + >> > drivers/gpu/drm/i915/Makefile | 6 + >> > drivers/gpu/drm/i915/i915_drv.c | 9 + >> > drivers/gpu/drm/i915/i915_drv.h | 4 + >> > drivers/gpu/drm/i915/i915_reg.h | 1 + >> > drivers/gpu/drm/i915/spi/intel_spi.c | 62 +++ >> > drivers/gpu/drm/i915/spi/intel_spi.h | 24 + >> >> I'm open to discussion, but after glancing through the series I've got a gut >> feeling spi/ subdir should be purely about the separate module, and the >> above two files should be in i915/ directory instead. > > Maybe, I don't have strong feelings about that, it is just a decision from which point you want to look at that. *shrug* No strong feelings either, and I don't think the decision is carved in stone. We can move them around later if we want. Up to you. BR, Jani. >> >> As it is, I think it's a bit confusing that spi/ is both about the spi kernel module >> and a singly .c file that's really part of i915.ko. Perhaps that messes up the >> conventional descending to subdirs in the kernel build too? > > The intention was to make this capsulated from the file system point of view. > In general the spi driver could be somewhere in mtd directory, but it doesn't really fit exactly there either. > I don't have a strong opinion about that, if you do I yield. > > > Thanks > Tomas >