Message ID | 20230210044826.9834-2-orlandoch.dev@gmail.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | apple-gmux: support MMIO gmux type on T2 Macs | expand |
Hi, On 2/10/23 05:48, Orlando Chamberlain wrote: > Currently it manually flips the byte order, but we can instead use > cpu_to_be32(val) for this. > > Signed-off-by: Orlando Chamberlain <orlandoch.dev@gmail.com> > --- > drivers/platform/x86/apple-gmux.c | 18 ++---------------- > 1 file changed, 2 insertions(+), 16 deletions(-) > > diff --git a/drivers/platform/x86/apple-gmux.c b/drivers/platform/x86/apple-gmux.c > index 9333f82cfa8a..e8cb084cb81f 100644 > --- a/drivers/platform/x86/apple-gmux.c > +++ b/drivers/platform/x86/apple-gmux.c > @@ -94,13 +94,7 @@ static u32 gmux_pio_read32(struct apple_gmux_data *gmux_data, int port) > static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int port, > u32 val) > { > - int i; > - u8 tmpval; > - > - for (i = 0; i < 4; i++) { > - tmpval = (val >> (i * 8)) & 0xff; > - outb(tmpval, gmux_data->iostart + port + i); > - } > + outl(cpu_to_be32(val), gmux_data->iostart + port); > } > > static int gmux_index_wait_ready(struct apple_gmux_data *gmux_data) The ioport / indexed-ioport accessed apple_gmux-es likely are (part of?) LPC bus devices . Looking at the bus level you are now changing 4 io accesses with a size of 1 byte, to 1 32 bit io-access. Depending on the decoding hw in the chip this may work fine, or this may work not at all. I realized that you have asked for more testing, but most surviving macbooks from the older apple-gmux era appear to be models without a discrete GPU (which are often the first thing to break) and thus without a gmux. Unless we get a bunch of testers to show up, which I doubt. I would prefer slightly bigger / less pretty code and not change the functional behavior of the driver on these older models. Regards, Hans > @@ -177,16 +171,8 @@ static u32 gmux_index_read32(struct apple_gmux_data *gmux_data, int port) > static void gmux_index_write32(struct apple_gmux_data *gmux_data, int port, > u32 val) > { > - int i; > - u8 tmpval; > - > mutex_lock(&gmux_data->index_lock); > - > - for (i = 0; i < 4; i++) { > - tmpval = (val >> (i * 8)) & 0xff; > - outb(tmpval, gmux_data->iostart + GMUX_PORT_VALUE + i); > - } > - > + outl(cpu_to_be32(val), gmux_data->iostart + GMUX_PORT_VALUE); > gmux_index_wait_ready(gmux_data); > outb(port & 0xff, gmux_data->iostart + GMUX_PORT_WRITE); > gmux_index_wait_complete(gmux_data);
Hi, On 2/10/23 20:09, Hans de Goede wrote: > Hi, > > On 2/10/23 05:48, Orlando Chamberlain wrote: >> Currently it manually flips the byte order, but we can instead use >> cpu_to_be32(val) for this. >> >> Signed-off-by: Orlando Chamberlain <orlandoch.dev@gmail.com> >> --- >> drivers/platform/x86/apple-gmux.c | 18 ++---------------- >> 1 file changed, 2 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/platform/x86/apple-gmux.c b/drivers/platform/x86/apple-gmux.c >> index 9333f82cfa8a..e8cb084cb81f 100644 >> --- a/drivers/platform/x86/apple-gmux.c >> +++ b/drivers/platform/x86/apple-gmux.c >> @@ -94,13 +94,7 @@ static u32 gmux_pio_read32(struct apple_gmux_data *gmux_data, int port) >> static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int port, >> u32 val) >> { >> - int i; >> - u8 tmpval; >> - >> - for (i = 0; i < 4; i++) { >> - tmpval = (val >> (i * 8)) & 0xff; >> - outb(tmpval, gmux_data->iostart + port + i); >> - } >> + outl(cpu_to_be32(val), gmux_data->iostart + port); >> } >> >> static int gmux_index_wait_ready(struct apple_gmux_data *gmux_data) > > The ioport / indexed-ioport accessed apple_gmux-es likely are (part of?) > LPC bus devices . Looking at the bus level you are now changing 4 io > accesses with a size of 1 byte, to 1 32 bit io-access. > > Depending on the decoding hw in the chip this may work fine, > or this may work not at all. > > I realized that you have asked for more testing, but most surviving > macbooks from the older apple-gmux era appear to be models without > a discrete GPU (which are often the first thing to break) and thus > without a gmux. > > Unless we get a bunch of testers to show up, which I doubt. I would > prefer slightly bigger / less pretty code and not change the functional > behavior of the driver on these older models. A quick follow up on this, I just noticed that only the pio_write32 is doing the one byte at a time thing: static u32 gmux_pio_read32(struct apple_gmux_data *gmux_data, int port) { return inl(gmux_data->iostart + port); } static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int port, u32 val) { int i; u8 tmpval; for (i = 0; i < 4; i++) { tmpval = (val >> (i * 8)) & 0xff; outb(tmpval, gmux_data->iostart + port + i); } } And if you look closely gmux_pio_write32() is not swapping the order to be32 at all, it is just taking the bytes in little-endian memory order, starting with the first (index 0) byte which is the least significant byte of the value. On x86 the original code is no different then doing: static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int port, u32 val) { u8 *data = (u8 *)&val; int i; for (i = 0; i < 4; i++) outb(data[i], gmux_data->iostart + port + i); } So yeah this patch is definitely wrong, it actually swaps the byte order compared to the original code. Which becomes clear when you look the weird difference between the read32 and write32 functions after this patch. Presumably there is a specific reason why gmux_pio_write32() is not already doing a single outl(..., val) and byte-ordering is not the reason. Regards, Hans >> @@ -177,16 +171,8 @@ static u32 gmux_index_read32(struct apple_gmux_data *gmux_data, int port) >> static void gmux_index_write32(struct apple_gmux_data *gmux_data, int port, >> u32 val) >> { >> - int i; >> - u8 tmpval; >> - >> mutex_lock(&gmux_data->index_lock); >> - >> - for (i = 0; i < 4; i++) { >> - tmpval = (val >> (i * 8)) & 0xff; >> - outb(tmpval, gmux_data->iostart + GMUX_PORT_VALUE + i); >> - } >> - >> + outl(cpu_to_be32(val), gmux_data->iostart + GMUX_PORT_VALUE); >> gmux_index_wait_ready(gmux_data); >> outb(port & 0xff, gmux_data->iostart + GMUX_PORT_WRITE); >> gmux_index_wait_complete(gmux_data); >
Hi, On 2/10/23 20:09, Hans de Goede wrote: > Hi, > > On 2/10/23 05:48, Orlando Chamberlain wrote: >> Currently it manually flips the byte order, but we can instead use >> cpu_to_be32(val) for this. >> >> Signed-off-by: Orlando Chamberlain <orlandoch.dev@gmail.com> >> --- >> drivers/platform/x86/apple-gmux.c | 18 ++---------------- >> 1 file changed, 2 insertions(+), 16 deletions(-) >> >> diff --git a/drivers/platform/x86/apple-gmux.c b/drivers/platform/x86/apple-gmux.c >> index 9333f82cfa8a..e8cb084cb81f 100644 >> --- a/drivers/platform/x86/apple-gmux.c >> +++ b/drivers/platform/x86/apple-gmux.c >> @@ -94,13 +94,7 @@ static u32 gmux_pio_read32(struct apple_gmux_data *gmux_data, int port) >> static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int port, >> u32 val) >> { >> - int i; >> - u8 tmpval; >> - >> - for (i = 0; i < 4; i++) { >> - tmpval = (val >> (i * 8)) & 0xff; >> - outb(tmpval, gmux_data->iostart + port + i); >> - } >> + outl(cpu_to_be32(val), gmux_data->iostart + port); >> } >> >> static int gmux_index_wait_ready(struct apple_gmux_data *gmux_data) > > The ioport / indexed-ioport accessed apple_gmux-es likely are (part of?) > LPC bus devices . Looking at the bus level you are now changing 4 io > accesses with a size of 1 byte, to 1 32 bit io-access. Correction to myself, re-reading the LPC specification, then if I'm right and this is a LPC device then all IO in/out accesses are always 1 byte accesses. Since the LPC bus only supports 16 / 32 bit accesses for DMA cycles. So presumably the outl() would get split into 4 separate 8 bit (port) IO accesses. Regards, Hans >> @@ -177,16 +171,8 @@ static u32 gmux_index_read32(struct apple_gmux_data *gmux_data, int port) >> static void gmux_index_write32(struct apple_gmux_data *gmux_data, int port, >> u32 val) >> { >> - int i; >> - u8 tmpval; >> - >> mutex_lock(&gmux_data->index_lock); >> - >> - for (i = 0; i < 4; i++) { >> - tmpval = (val >> (i * 8)) & 0xff; >> - outb(tmpval, gmux_data->iostart + GMUX_PORT_VALUE + i); >> - } >> - >> + outl(cpu_to_be32(val), gmux_data->iostart + GMUX_PORT_VALUE); >> gmux_index_wait_ready(gmux_data); >> outb(port & 0xff, gmux_data->iostart + GMUX_PORT_WRITE); >> gmux_index_wait_complete(gmux_data); >
From: Hans de Goede > Sent: 10 February 2023 19:33 > > Hi, > > On 2/10/23 20:09, Hans de Goede wrote: > > Hi, > > > > On 2/10/23 05:48, Orlando Chamberlain wrote: > >> Currently it manually flips the byte order, but we can instead use > >> cpu_to_be32(val) for this. > >> > >> Signed-off-by: Orlando Chamberlain <orlandoch.dev@gmail.com> > >> --- > >> drivers/platform/x86/apple-gmux.c | 18 ++---------------- > >> 1 file changed, 2 insertions(+), 16 deletions(-) > >> > >> diff --git a/drivers/platform/x86/apple-gmux.c b/drivers/platform/x86/apple-gmux.c > >> index 9333f82cfa8a..e8cb084cb81f 100644 > >> --- a/drivers/platform/x86/apple-gmux.c > >> +++ b/drivers/platform/x86/apple-gmux.c > >> @@ -94,13 +94,7 @@ static u32 gmux_pio_read32(struct apple_gmux_data *gmux_data, int port) > >> static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int port, > >> u32 val) > >> { > >> - int i; > >> - u8 tmpval; > >> - > >> - for (i = 0; i < 4; i++) { > >> - tmpval = (val >> (i * 8)) & 0xff; > >> - outb(tmpval, gmux_data->iostart + port + i); > >> - } > >> + outl(cpu_to_be32(val), gmux_data->iostart + port); > >> } > >> > >> static int gmux_index_wait_ready(struct apple_gmux_data *gmux_data) > > > > The ioport / indexed-ioport accessed apple_gmux-es likely are (part of?) > > LPC bus devices . Looking at the bus level you are now changing 4 io > > accesses with a size of 1 byte, to 1 32 bit io-access. > > Correction to myself, re-reading the LPC specification, then > if I'm right and this is a LPC device then all IO in/out accesses > are always 1 byte accesses. Since the LPC bus only supports 16 / 32 > bit accesses for DMA cycles. > > So presumably the outl() would get split into 4 separate 8 bit > (port) IO accesses. I wonder if there is something obscure and the order of the 4 bytes writes matters? In any case writing as: xxxx iostart = gmux_data->iostart + port; outb(val, iostart); outb(val >> 8, iostart + 1); outb(val >> 16, iostart + 2); outb(val >> 24, ioctart + 3); almost certainly generates better code. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Fri, 10 Feb 2023 20:19:27 +0100 Hans de Goede <hdegoede@redhat.com> wrote: > Hi, > > On 2/10/23 20:09, Hans de Goede wrote: > > Hi, > > > > On 2/10/23 05:48, Orlando Chamberlain wrote: > >> Currently it manually flips the byte order, but we can instead use > >> cpu_to_be32(val) for this. > >> > >> Signed-off-by: Orlando Chamberlain <orlandoch.dev@gmail.com> > >> --- > >> drivers/platform/x86/apple-gmux.c | 18 ++---------------- > >> 1 file changed, 2 insertions(+), 16 deletions(-) > >> > >> diff --git a/drivers/platform/x86/apple-gmux.c > >> b/drivers/platform/x86/apple-gmux.c index > >> 9333f82cfa8a..e8cb084cb81f 100644 --- > >> a/drivers/platform/x86/apple-gmux.c +++ > >> b/drivers/platform/x86/apple-gmux.c @@ -94,13 +94,7 @@ static u32 > >> gmux_pio_read32(struct apple_gmux_data *gmux_data, int port) > >> static void gmux_pio_write32(struct apple_gmux_data *gmux_data, > >> int port, u32 val) { > >> - int i; > >> - u8 tmpval; > >> - > >> - for (i = 0; i < 4; i++) { > >> - tmpval = (val >> (i * 8)) & 0xff; > >> - outb(tmpval, gmux_data->iostart + port + i); > >> - } > >> + outl(cpu_to_be32(val), gmux_data->iostart + port); > >> } > >> > >> static int gmux_index_wait_ready(struct apple_gmux_data > >> *gmux_data) > > > > The ioport / indexed-ioport accessed apple_gmux-es likely are (part > > of?) LPC bus devices . Looking at the bus level you are now > > changing 4 io accesses with a size of 1 byte, to 1 32 bit io-access. > > > > Depending on the decoding hw in the chip this may work fine, > > or this may work not at all. > > > > I realized that you have asked for more testing, but most surviving > > macbooks from the older apple-gmux era appear to be models without > > a discrete GPU (which are often the first thing to break) and thus > > without a gmux. > > > > Unless we get a bunch of testers to show up, which I doubt. I would > > prefer slightly bigger / less pretty code and not change the > > functional behavior of the driver on these older models. > > A quick follow up on this, I just noticed that only the pio_write32 > is doing the one byte at a time thing: > > static u32 gmux_pio_read32(struct apple_gmux_data *gmux_data, int > port) { > return inl(gmux_data->iostart + port); > } > > static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int > port, u32 val) > { > int i; > u8 tmpval; > > for (i = 0; i < 4; i++) { > tmpval = (val >> (i * 8)) & 0xff; > outb(tmpval, gmux_data->iostart + port + i); > } > } > > And if you look closely gmux_pio_write32() is not swapping > the order to be32 at all, it is just taking the bytes > in little-endian memory order, starting with the first > (index 0) byte which is the least significant byte of > the value. > > On x86 the original code is no different then doing: > > static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int > port, u32 val) > { > u8 *data = (u8 *)&val; > int i; > > for (i = 0; i < 4; i++) > outb(data[i], gmux_data->iostart + port + i); > } > > So yeah this patch is definitely wrong, it actually swaps > the byte order compared to the original code. Which becomes > clear when you look the weird difference between the read32 and > write32 functions after this patch. > > Presumably there is a specific reason why gmux_pio_write32() > is not already doing a single outl(..., val) and byte-ordering > is not the reason. > > Regards, > > Hans Sounds like it may be better to just drop this patch as there's very little benefit for the risk of causing a regression. > > > > >> @@ -177,16 +171,8 @@ static u32 gmux_index_read32(struct > >> apple_gmux_data *gmux_data, int port) static void > >> gmux_index_write32(struct apple_gmux_data *gmux_data, int port, > >> u32 val) { > >> - int i; > >> - u8 tmpval; > >> - > >> mutex_lock(&gmux_data->index_lock); > >> - > >> - for (i = 0; i < 4; i++) { > >> - tmpval = (val >> (i * 8)) & 0xff; > >> - outb(tmpval, gmux_data->iostart + GMUX_PORT_VALUE > >> + i); > >> - } > >> - > >> + outl(cpu_to_be32(val), gmux_data->iostart + > >> GMUX_PORT_VALUE); gmux_index_wait_ready(gmux_data); > >> outb(port & 0xff, gmux_data->iostart + GMUX_PORT_WRITE); > >> gmux_index_wait_complete(gmux_data); > > >
Hi, On 2/11/23 00:30, Orlando Chamberlain wrote: > On Fri, 10 Feb 2023 20:19:27 +0100 > Hans de Goede <hdegoede@redhat.com> wrote: > >> Hi, >> >> On 2/10/23 20:09, Hans de Goede wrote: >>> Hi, >>> >>> On 2/10/23 05:48, Orlando Chamberlain wrote: >>>> Currently it manually flips the byte order, but we can instead use >>>> cpu_to_be32(val) for this. >>>> >>>> Signed-off-by: Orlando Chamberlain <orlandoch.dev@gmail.com> >>>> --- >>>> drivers/platform/x86/apple-gmux.c | 18 ++---------------- >>>> 1 file changed, 2 insertions(+), 16 deletions(-) >>>> >>>> diff --git a/drivers/platform/x86/apple-gmux.c >>>> b/drivers/platform/x86/apple-gmux.c index >>>> 9333f82cfa8a..e8cb084cb81f 100644 --- >>>> a/drivers/platform/x86/apple-gmux.c +++ >>>> b/drivers/platform/x86/apple-gmux.c @@ -94,13 +94,7 @@ static u32 >>>> gmux_pio_read32(struct apple_gmux_data *gmux_data, int port) >>>> static void gmux_pio_write32(struct apple_gmux_data *gmux_data, >>>> int port, u32 val) { >>>> - int i; >>>> - u8 tmpval; >>>> - >>>> - for (i = 0; i < 4; i++) { >>>> - tmpval = (val >> (i * 8)) & 0xff; >>>> - outb(tmpval, gmux_data->iostart + port + i); >>>> - } >>>> + outl(cpu_to_be32(val), gmux_data->iostart + port); >>>> } >>>> >>>> static int gmux_index_wait_ready(struct apple_gmux_data >>>> *gmux_data) >>> >>> The ioport / indexed-ioport accessed apple_gmux-es likely are (part >>> of?) LPC bus devices . Looking at the bus level you are now >>> changing 4 io accesses with a size of 1 byte, to 1 32 bit io-access. >>> >>> Depending on the decoding hw in the chip this may work fine, >>> or this may work not at all. >>> >>> I realized that you have asked for more testing, but most surviving >>> macbooks from the older apple-gmux era appear to be models without >>> a discrete GPU (which are often the first thing to break) and thus >>> without a gmux. >>> >>> Unless we get a bunch of testers to show up, which I doubt. I would >>> prefer slightly bigger / less pretty code and not change the >>> functional behavior of the driver on these older models. >> >> A quick follow up on this, I just noticed that only the pio_write32 >> is doing the one byte at a time thing: >> >> static u32 gmux_pio_read32(struct apple_gmux_data *gmux_data, int >> port) { >> return inl(gmux_data->iostart + port); >> } >> >> static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int >> port, u32 val) >> { >> int i; >> u8 tmpval; >> >> for (i = 0; i < 4; i++) { >> tmpval = (val >> (i * 8)) & 0xff; >> outb(tmpval, gmux_data->iostart + port + i); >> } >> } >> >> And if you look closely gmux_pio_write32() is not swapping >> the order to be32 at all, it is just taking the bytes >> in little-endian memory order, starting with the first >> (index 0) byte which is the least significant byte of >> the value. >> >> On x86 the original code is no different then doing: >> >> static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int >> port, u32 val) >> { >> u8 *data = (u8 *)&val; >> int i; >> >> for (i = 0; i < 4; i++) >> outb(data[i], gmux_data->iostart + port + i); >> } >> >> So yeah this patch is definitely wrong, it actually swaps >> the byte order compared to the original code. Which becomes >> clear when you look the weird difference between the read32 and >> write32 functions after this patch. >> >> Presumably there is a specific reason why gmux_pio_write32() >> is not already doing a single outl(..., val) and byte-ordering >> is not the reason. >> >> Regards, >> >> Hans > > Sounds like it may be better to just drop this patch as there's very > little benefit for the risk of causing a regression. Yes if it is easy to drop this please drop this. And the same more or less applies to 2/9. I would rather have an extra "if () ... else ..." in the code in a couple of places then change behavior on old hw where we cannot get proper test coverage (but will likely eventually get regressions reports if we break things). Thanks & Regards, Hans >>>> @@ -177,16 +171,8 @@ static u32 gmux_index_read32(struct >>>> apple_gmux_data *gmux_data, int port) static void >>>> gmux_index_write32(struct apple_gmux_data *gmux_data, int port, >>>> u32 val) { >>>> - int i; >>>> - u8 tmpval; >>>> - >>>> mutex_lock(&gmux_data->index_lock); >>>> - >>>> - for (i = 0; i < 4; i++) { >>>> - tmpval = (val >> (i * 8)) & 0xff; >>>> - outb(tmpval, gmux_data->iostart + GMUX_PORT_VALUE >>>> + i); >>>> - } >>>> - >>>> + outl(cpu_to_be32(val), gmux_data->iostart + >>>> GMUX_PORT_VALUE); gmux_index_wait_ready(gmux_data); >>>> outb(port & 0xff, gmux_data->iostart + GMUX_PORT_WRITE); >>>> gmux_index_wait_complete(gmux_data); >>> >> >
diff --git a/drivers/platform/x86/apple-gmux.c b/drivers/platform/x86/apple-gmux.c index 9333f82cfa8a..e8cb084cb81f 100644 --- a/drivers/platform/x86/apple-gmux.c +++ b/drivers/platform/x86/apple-gmux.c @@ -94,13 +94,7 @@ static u32 gmux_pio_read32(struct apple_gmux_data *gmux_data, int port) static void gmux_pio_write32(struct apple_gmux_data *gmux_data, int port, u32 val) { - int i; - u8 tmpval; - - for (i = 0; i < 4; i++) { - tmpval = (val >> (i * 8)) & 0xff; - outb(tmpval, gmux_data->iostart + port + i); - } + outl(cpu_to_be32(val), gmux_data->iostart + port); } static int gmux_index_wait_ready(struct apple_gmux_data *gmux_data) @@ -177,16 +171,8 @@ static u32 gmux_index_read32(struct apple_gmux_data *gmux_data, int port) static void gmux_index_write32(struct apple_gmux_data *gmux_data, int port, u32 val) { - int i; - u8 tmpval; - mutex_lock(&gmux_data->index_lock); - - for (i = 0; i < 4; i++) { - tmpval = (val >> (i * 8)) & 0xff; - outb(tmpval, gmux_data->iostart + GMUX_PORT_VALUE + i); - } - + outl(cpu_to_be32(val), gmux_data->iostart + GMUX_PORT_VALUE); gmux_index_wait_ready(gmux_data); outb(port & 0xff, gmux_data->iostart + GMUX_PORT_WRITE); gmux_index_wait_complete(gmux_data);
Currently it manually flips the byte order, but we can instead use cpu_to_be32(val) for this. Signed-off-by: Orlando Chamberlain <orlandoch.dev@gmail.com> --- drivers/platform/x86/apple-gmux.c | 18 ++---------------- 1 file changed, 2 insertions(+), 16 deletions(-)