diff mbox series

[v1,3/4] hwmon (it87): Test for chipset before entering configuration mode

Message ID 20240401025620.205068-4-frank@crawford.emu.id.au (mailing list archive)
State Rejected
Headers show
Series hwmon (it87): Correct handling for configuration mode | expand

Commit Message

Frank Crawford April 1, 2024, 2:56 a.m. UTC
Major part of the change for the new method to avoid chipset issues.

Perform an intial test if the chip ID can be read without entering
configuration mode, if so, do not enter configuration mode and continue
as is.

If chip ID cannot be read, enter configuration mode, and continue with
previous code.

Also update exit code to take account of if we entered configuration
mode or not.

Signed-off-by: Frank Crawford <frank@crawford.emu.id.au>
---
 drivers/hwmon/it87.c | 52 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 47 insertions(+), 5 deletions(-)

Comments

Guenter Roeck April 10, 2024, 3:17 p.m. UTC | #1
On Mon, Apr 01, 2024 at 01:56:05PM +1100, Frank Crawford wrote:
> Major part of the change for the new method to avoid chipset issues.
> 
> Perform an intial test if the chip ID can be read without entering
> configuration mode, if so, do not enter configuration mode and continue
> as is.
> 
> If chip ID cannot be read, enter configuration mode, and continue with
> previous code.
> 
> Also update exit code to take account of if we entered configuration
> mode or not.
> 

You describe the changes, but you don't describe the problem you are
trying to solve. Even if configuration mode was already entered, that
is not a reason to keep it active. We don't _know_ what is going on
outside the driver and can not make assumptions. For changes like this
you really need to explain the problem you are trying to solve, and the
reasoning behind the changes. Just assuming that a set of chips would
have its SIO mode enabled by the BIOS is just wrong. We don't know
what if anything the BIOS is doing.

Guenter
Frank Crawford April 11, 2024, 11:08 a.m. UTC | #2
On Wed, 2024-04-10 at 08:17 -0700, Guenter Roeck wrote:
> On Mon, Apr 01, 2024 at 01:56:05PM +1100, Frank Crawford wrote:
> > Major part of the change for the new method to avoid chipset
> > issues.
> > 
> > Perform an intial test if the chip ID can be read without entering
> > configuration mode, if so, do not enter configuration mode and
> > continue
> > as is.
> > 
> > If chip ID cannot be read, enter configuration mode, and continue
> > with
> > previous code.
> > 
> > Also update exit code to take account of if we entered
> > configuration
> > mode or not.
> > 
> 
> You describe the changes, but you don't describe the problem you are
> trying to solve. Even if configuration mode was already entered, that
> is not a reason to keep it active. We don't _know_ what is going on
> outside the driver and can not make assumptions. For changes like
> this
> you really need to explain the problem you are trying to solve, and
> the
> reasoning behind the changes. Just assuming that a set of chips would
> have its SIO mode enabled by the BIOS is just wrong. We don't know
> what if anything the BIOS is doing.

In this regard, it is no different to the current code, which runs SIO
mode enable, but does not ever run the SIO mode disable code.

In fact this code is actually safer than the previous code in that it
acts similar to the chip not being enabled or disabled, which would
happen if no driver existed.
> 
> Guenter
> 
Regards
Frank
Guenter Roeck April 11, 2024, 12:20 p.m. UTC | #3
On Thu, Apr 11, 2024 at 09:08:19PM +1000, Frank Crawford wrote:
> On Wed, 2024-04-10 at 08:17 -0700, Guenter Roeck wrote:
> > On Mon, Apr 01, 2024 at 01:56:05PM +1100, Frank Crawford wrote:
> > > Major part of the change for the new method to avoid chipset
> > > issues.
> > > 
> > > Perform an intial test if the chip ID can be read without entering
> > > configuration mode, if so, do not enter configuration mode and
> > > continue
> > > as is.
> > > 
> > > If chip ID cannot be read, enter configuration mode, and continue
> > > with
> > > previous code.
> > > 
> > > Also update exit code to take account of if we entered
> > > configuration
> > > mode or not.
> > > 
> > 
> > You describe the changes, but you don't describe the problem you are
> > trying to solve. Even if configuration mode was already entered, that
> > is not a reason to keep it active. We don't _know_ what is going on
> > outside the driver and can not make assumptions. For changes like
> > this
> > you really need to explain the problem you are trying to solve, and
> > the
> > reasoning behind the changes. Just assuming that a set of chips would
> > have its SIO mode enabled by the BIOS is just wrong. We don't know
> > what if anything the BIOS is doing.
> 
> In this regard, it is no different to the current code, which runs SIO
> mode enable, but does not ever run the SIO mode disable code.
> 
> In fact this code is actually safer than the previous code in that it
> acts similar to the chip not being enabled or disabled, which would
> happen if no driver existed.

Not really. There is also the watchdog code which will happily disable
SIO access after it is done, causing subsequent accesses by the hwmon
driver to fail. The code also assumes that SIO access was not erroneously
left enabled by some other code which we don't have any control over.

You assume that the hwmon driver is the only driver accessing the chip.
That is a wrong assumption. I understand that the underlying problem
is really that there is no SIO access infrastructure in the kernel.
In the absence of such an infrastructure we can not make any assumptions
about SIO access control implemented by other drivers in the kernel,
and specifically can not assume that SIO access won't be disabled by
other drivers just because it was enabled when the hwmon driver probe
function was running.

Guenter
Frank Crawford April 12, 2024, 10:11 a.m. UTC | #4
On Thu, 2024-04-11 at 05:20 -0700, Guenter Roeck wrote:
> On Thu, Apr 11, 2024 at 09:08:19PM +1000, Frank Crawford wrote:
> > On Wed, 2024-04-10 at 08:17 -0700, Guenter Roeck wrote:
> > > On Mon, Apr 01, 2024 at 01:56:05PM +1100, Frank Crawford wrote:
> > > > Major part of the change for the new method to avoid chipset
> > > > issues.
> > > > 
> > > > Perform an intial test if the chip ID can be read without
> > > > entering
> > > > configuration mode, if so, do not enter configuration mode and
> > > > continue
> > > > as is.
> > > > 
> > > > If chip ID cannot be read, enter configuration mode, and
> > > > continue
> > > > with
> > > > previous code.
> > > > 
> > > > Also update exit code to take account of if we entered
> > > > configuration
> > > > mode or not.
> > > > 
> > > 
> > > You describe the changes, but you don't describe the problem you
> > > are
> > > trying to solve. Even if configuration mode was already entered,
> > > that
> > > is not a reason to keep it active. We don't _know_ what is going
> > > on
> > > outside the driver and can not make assumptions. For changes like
> > > this
> > > you really need to explain the problem you are trying to solve,
> > > and
> > > the
> > > reasoning behind the changes. Just assuming that a set of chips
> > > would
> > > have its SIO mode enabled by the BIOS is just wrong. We don't
> > > know
> > > what if anything the BIOS is doing.
> > 
> > In this regard, it is no different to the current code, which runs
> > SIO
> > mode enable, but does not ever run the SIO mode disable code.
> > 
> > In fact this code is actually safer than the previous code in that
> > it
> > acts similar to the chip not being enabled or disabled, which would
> > happen if no driver existed.
> 
> Not really. There is also the watchdog code which will happily
> disable
> SIO access after it is done, causing subsequent accesses by the hwmon
> driver to fail. The code also assumes that SIO access was not
> erroneously
> left enabled by some other code which we don't have any control over.

And unfortunately if I can't do anything about it, I can only ignore
it.  If something does come up we can see what can work out at the
time.
> 
> You assume that the hwmon driver is the only driver accessing the
> chip.
> That is a wrong assumption. I understand that the underlying problem
> is really that there is no SIO access infrastructure in the kernel.
> In the absence of such an infrastructure we can not make any
> assumptions
> about SIO access control implemented by other drivers in the kernel,
> and specifically can not assume that SIO access won't be disabled by
> other drivers just because it was enabled when the hwmon driver probe
> function was running.

In this case the fact that it is the second chip may mean it will not
come up.  While I am told that the chip is fully functional with non-
hwmon functions, but currently it does look like most of those aren't
used.  While this won't necessarily stay this way in the future, we
currently cannot do anything about it.
> 
> Guenter

Regards
Frank
Guenter Roeck April 12, 2024, 12:48 p.m. UTC | #5
On Fri, Apr 12, 2024 at 08:11:11PM +1000, Frank Crawford wrote:
> > 
> > Not really. There is also the watchdog code which will happily
> > disable
> > SIO access after it is done, causing subsequent accesses by the hwmon
> > driver to fail. The code also assumes that SIO access was not
> > erroneously
> > left enabled by some other code which we don't have any control over.
> 
> And unfortunately if I can't do anything about it, I can only ignore
> it.  If something does come up we can see what can work out at the
> time.
> > 
> > You assume that the hwmon driver is the only driver accessing the
> > chip.
> > That is a wrong assumption. I understand that the underlying problem
> > is really that there is no SIO access infrastructure in the kernel.
> > In the absence of such an infrastructure we can not make any
> > assumptions
> > about SIO access control implemented by other drivers in the kernel,
> > and specifically can not assume that SIO access won't be disabled by
> > other drivers just because it was enabled when the hwmon driver probe
> > function was running.
> 
> In this case the fact that it is the second chip may mean it will not
> come up.  While I am told that the chip is fully functional with non-
> hwmon functions, but currently it does look like most of those aren't
> used.  While this won't necessarily stay this way in the future, we
> currently cannot do anything about it.

This patch affects all chips, not just the second one. If any chip is
in configuration mode when instantiating this driver, configuration mode
won't be enabled anymore, no matter what other drivers may or may not do.
That includes situations where other drivers (or the BIOS) erroneously
do not disable configuration mode.

I understand your reasoning about not enabling configuration mode for
certain chips, but that does not explain why it would be necessary
to do this for all chips all the time.

Sure, there is something we can do: Unless there is a known problem
that affects _all_ chips, drop this patch.

Thanks,
Guenter
Frank Crawford April 13, 2024, 8:47 a.m. UTC | #6
On Fri, 2024-04-12 at 05:48 -0700, Guenter Roeck wrote:
> On Fri, Apr 12, 2024 at 08:11:11PM +1000, Frank Crawford wrote:
> > > 
> > > Not really. There is also the watchdog code which will happily
> > > disable
> > > SIO access after it is done, causing subsequent accesses by the
> > > hwmon
> > > driver to fail. The code also assumes that SIO access was not
> > > erroneously
> > > left enabled by some other code which we don't have any control
> > > over.
> > 
> > And unfortunately if I can't do anything about it, I can only
> > ignore
> > it.  If something does come up we can see what can work out at the
> > time.
> > > 
> > > You assume that the hwmon driver is the only driver accessing the
> > > chip.
> > > That is a wrong assumption. I understand that the underlying
> > > problem
> > > is really that there is no SIO access infrastructure in the
> > > kernel.
> > > In the absence of such an infrastructure we can not make any
> > > assumptions
> > > about SIO access control implemented by other drivers in the
> > > kernel,
> > > and specifically can not assume that SIO access won't be disabled
> > > by
> > > other drivers just because it was enabled when the hwmon driver
> > > probe
> > > function was running.
> > 
> > In this case the fact that it is the second chip may mean it will
> > not
> > come up.  While I am told that the chip is fully functional with
> > non-
> > hwmon functions, but currently it does look like most of those
> > aren't
> > used.  While this won't necessarily stay this way in the future, we
> > currently cannot do anything about it.
> 
> This patch affects all chips, not just the second one. If any chip is
> in configuration mode when instantiating this driver, configuration
> mode
> won't be enabled anymore, no matter what other drivers may or may not
> do.
> That includes situations where other drivers (or the BIOS)
> erroneously
> do not disable configuration mode.
> 
> I understand your reasoning about not enabling configuration mode for
> certain chips, but that does not explain why it would be necessary
> to do this for all chips all the time.
> 
> Sure, there is something we can do: Unless there is a known problem
> that affects _all_ chips, drop this patch.

Ahh, if that is the impression you have from my description, then I
will need to improve it, as that is not the case.

The actual update does the following:

1) Lock the memory, but does not perform a SIO entry (previously it
would have performed an SIO entry).

2) Attempt to read the chipID.  This should be safe no matter which
chip we have.

3) If step (2) fails, then perform SIO entry and retry chipID read.  If
it fails, act similarly to prior to this patch.

4) Set the sio_data->type, similar to previously.

5) If we have not performed an SIO entry, and this is not a chip type
with the NOCONF feature, then it will perform an SIO entry at this
point.

6) Proceed setup as prior to this patch.

7) Any following access to the SIO registers will invoke the SIO entry
and SIO exit steps unless it is a chip with the NOCONF feature set. 
This was set up in the previous patches in this patchset.

8) There is also some minor update to the failure exit based on if it
had performed a SIO entry or not, in addition to the previous tests.

> 
> Thanks,
> Guenter

Regards
Frank
diff mbox series

Patch

diff --git a/drivers/hwmon/it87.c b/drivers/hwmon/it87.c
index 396c2d3afbf7..6a77f2f6e1e1 100644
--- a/drivers/hwmon/it87.c
+++ b/drivers/hwmon/it87.c
@@ -2667,6 +2667,27 @@  static const struct attribute_group it87_group_auto_pwm = {
 	.is_visible = it87_auto_pwm_is_visible,
 };
 
+/*
+ * Original explanation:
+ * On various Gigabyte AM4 boards (AB350, AX370), the second Super-IO chip
+ * (IT8792E) needs to be in configuration mode before accessing the first
+ * due to a bug in IT8792E which otherwise results in LPC bus access errors.
+ * This needs to be done before accessing the first Super-IO chip since
+ * the second chip may have been accessed prior to loading this driver.
+ *
+ * The problem is also reported to affect IT8795E, which is used on X299 boards
+ * and has the same chip ID as IT8792E (0x8733). It also appears to affect
+ * systems with IT8790E, which is used on some Z97X-Gaming boards as well as
+ * Z87X-OC.
+ *
+ * From other information supplied:
+ * ChipIDs 0x8733, 0x8695 (early ID for IT87952E) and 0x8790 are intialised
+ * by the BIOS and left in configuration mode, and entering and/or exiting
+ * configuration mode is what causes the crash.
+ *
+ * The recommendation is to look up the chipID before doing any mode swap
+ * and then act accordingly.
+ */
 /* SuperIO detection - will change isa_address if a chip is found */
 static int __init it87_find(int sioaddr, unsigned short *address,
 			    struct it87_sio_data *sio_data, int chip_cnt)
@@ -2674,16 +2695,25 @@  static int __init it87_find(int sioaddr, unsigned short *address,
 	int err;
 	u16 chip_type;
 	const struct it87_devices *config = NULL;
+	bool opened = false;
 
-	err = superio_enter(sioaddr, false);
+	/* First step, lock memory but don't enter configuration mode */
+	err = superio_enter(sioaddr, true);
 	if (err)
 		return err;
 
 	err = -ENODEV;
 	chip_type = superio_inw(sioaddr, DEVID);
-	/* check first for a valid chip before forcing chip id */
-	if (chip_type == 0xffff)
-		goto exit;
+	/* Check for a valid chip before forcing chip id */
+	if (chip_type == 0xffff) {
+		/* Enter configuration mode */
+		__superio_enter(sioaddr);
+		opened = true;
+		/* and then try again */
+		chip_type = superio_inw(sioaddr, DEVID);
+		if (chip_type == 0xffff)
+			goto exit;
+	}
 
 	if (force_id_cnt == 1) {
 		/* If only one value given use for all chips */
@@ -2767,6 +2797,18 @@  static int __init it87_find(int sioaddr, unsigned short *address,
 
 	config = &it87_devices[sio_data->type];
 
+	/*
+	 * If previously we didn't enter configuration mode and it isn't a
+	 * chip we know is initialised by the BIOS, then enter configuration
+	 * mode.
+	 *
+	 * I don't know if any such chips can exist but be defensive.
+	 */
+	if (!opened && !has_conf_biosopen(config)) {
+		__superio_enter(sioaddr);
+		opened = true;
+	}
+
 	superio_select(sioaddr, PME);
 	if (!(superio_inb(sioaddr, IT87_ACT_REG) & 0x01)) {
 		pr_info("Device (chip %s ioreg 0x%x) not activated, skipping\n",
@@ -3144,7 +3186,7 @@  static int __init it87_find(int sioaddr, unsigned short *address,
 	}
 
 exit:
-	superio_exit(sioaddr, config ? has_conf_biosopen(config) : false);
+	superio_exit(sioaddr, opened && config && has_conf_biosopen(config));
 	return err;
 }