Message ID | e28c4b4deaf766910c366ab87b64325da59c8ad6.1443198783.git.viresh.kumar@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 2015-09-25 at 09:41 -0700, Viresh Kumar wrote: > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > V3->V4: > - Create a local variable instead of changing type of global_lock > (Rafael) Err, surely that wasn't what Rafael meant, since it's clearly impossible to use a pointer to the stack, assign to it once, and the expect anything to wkr at all ... johannes
On 25-09-15, 19:42, Johannes Berg wrote: > On Fri, 2015-09-25 at 09:41 -0700, Viresh Kumar wrote: > > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > > --- > > V3->V4: > > - Create a local variable instead of changing type of global_lock > > (Rafael) > > Err, surely that wasn't what Rafael meant, since it's clearly > impossible to use a pointer to the stack, assign to it once, and the > expect anything to wkr at all ... Sorry, I am not sure on what wouldn't work with this patch but this is what Rafael said, and I don't think I wrote it differently :) Rafael wrote: > Actually, what about adding a local u32 variable, say val, here and doing > > > if (!debugfs_create_x32("gpe", 0444, dev_dir, (u32 *)&first_ec->gpe)) > > goto error; > > if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, > > - (u32 *)&first_ec->global_lock)) > > + &first_ec->global_lock)) > > if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, &val)) > > > goto error; > > first_ec->global_lock = val; > > And then you can turn val into bool just fine without changing the structure > definition.
> Rafael wrote: > > Actually, what about adding a local u32 variable, say val, here and > > doing > > > > > if (!debugfs_create_x32("gpe", 0444, dev_dir, (u32 > > > *)&first_ec->gpe)) > > > goto error; > > > if (!debugfs_create_bool("use_global_lock", 0444, > > > dev_dir, > > > - (u32 *)&first_ec->global_lock)) > > > + &first_ec->global_lock)) > > > > if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, > > &val)) > > > > > goto error; > > > > first_ec->global_lock = val; > > > > And then you can turn val into bool just fine without changing the > > structure > > definition. Ok, then, but that means Rafael is completely wrong ... debugfs_create_bool() takes a *pointer* and it needs to be long-lived, it can't be on the stack. You also don't get a call when it changes. If you cannot change the struct definition then you must implement a debugfs file with its own read/write handlers. johannes
On 25-09-15, 20:49, Johannes Berg wrote: > Ok, then, but that means Rafael is completely wrong ... > debugfs_create_bool() takes a *pointer* and it needs to be long-lived, > it can't be on the stack. You also don't get a call when it changes. Ahh, ofcourse. My bad as well... I think we can change structure definition but will wait for Rafael's comment before that.
On Friday, September 25, 2015 09:41:37 AM Viresh Kumar wrote: > global_lock is defined as an unsigned long and accessing only its lower > 32 bits from sysfs is incorrect, as we need to consider other 32 bits > for big endian 64 bit systems. There are no such platforms yet, but the > code needs to be robust for such a case. > > Fix that by passing a local variable to debugfs_create_bool() and > assigning its value to global_lock later. > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Greg, please take this one if the [2/2] looks good to you. > --- > V3->V4: > - Create a local variable instead of changing type of global_lock > (Rafael) > - Drop the stable tag > - BCC'd a lot of people (rather than cc'ing them) to make sure > - the series reaches them > - mailing lists do not block the patchset due to long cc list > - and we don't spam the BCC'd people for every reply > --- > drivers/acpi/ec_sys.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/acpi/ec_sys.c b/drivers/acpi/ec_sys.c > index b4c216bab22b..b44b91331a56 100644 > --- a/drivers/acpi/ec_sys.c > +++ b/drivers/acpi/ec_sys.c > @@ -110,6 +110,7 @@ static int acpi_ec_add_debugfs(struct acpi_ec *ec, unsigned int ec_device_count) > struct dentry *dev_dir; > char name[64]; > umode_t mode = 0400; > + u32 val; > > if (ec_device_count == 0) { > acpi_ec_debugfs_dir = debugfs_create_dir("ec", NULL); > @@ -127,10 +128,11 @@ static int acpi_ec_add_debugfs(struct acpi_ec *ec, unsigned int ec_device_count) > > if (!debugfs_create_x32("gpe", 0444, dev_dir, (u32 *)&first_ec->gpe)) > goto error; > - if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, > - (u32 *)&first_ec->global_lock)) > + if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, &val)) > goto error; > > + first_ec->global_lock = val; > + > if (write_support) > mode = 0600; > if (!debugfs_create_file("io", mode, dev_dir, ec, &acpi_ec_io_ops)) > Thanks, Rafael
On Friday, September 25, 2015 10:18:13 PM Rafael J. Wysocki wrote: > On Friday, September 25, 2015 09:41:37 AM Viresh Kumar wrote: > > global_lock is defined as an unsigned long and accessing only its lower > > 32 bits from sysfs is incorrect, as we need to consider other 32 bits > > for big endian 64 bit systems. There are no such platforms yet, but the > > code needs to be robust for such a case. > > > > Fix that by passing a local variable to debugfs_create_bool() and > > assigning its value to global_lock later. > > > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > > Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Greg, please take this one if the [2/2] looks good to you. Ouch, turns out it was a bad idea. Please scratch that. Thanks, Rafael
On 25 September 2015 at 13:33, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > You're going to change that into bool in the next patch, right? Yeah. > So what if bool is a byte and the field is not word-aligned Its between two 'unsigned long' variables today, and the struct isn't packed. So, it will be aligned, isn't it? > and changing > that byte requires a read-modify-write. How do we ensure that things remain > consistent in that case? I didn't understood why a read-modify-write is special here? That's what will happen to most of the non-word-sized fields anyway? Probably I didn't understood what you meant.. -- viresh
On Friday, September 25, 2015 11:52:56 AM Viresh Kumar wrote: > On 25-09-15, 20:49, Johannes Berg wrote: > > Ok, then, but that means Rafael is completely wrong ... > > debugfs_create_bool() takes a *pointer* and it needs to be long-lived, > > it can't be on the stack. You also don't get a call when it changes. > > Ahh, ofcourse. My bad as well... Well, sorry about the wrong suggestion. > I think we can change structure definition but will wait for Rafael's > comment before that. OK, change the structure then. Thanks, Rafael
On Friday, September 25, 2015 10:26:22 PM Rafael J. Wysocki wrote: > On Friday, September 25, 2015 11:52:56 AM Viresh Kumar wrote: > > On 25-09-15, 20:49, Johannes Berg wrote: > > > Ok, then, but that means Rafael is completely wrong ... > > > debugfs_create_bool() takes a *pointer* and it needs to be long-lived, > > > it can't be on the stack. You also don't get a call when it changes. > > > > Ahh, ofcourse. My bad as well... > > Well, sorry about the wrong suggestion. > > > I think we can change structure definition but will wait for Rafael's > > comment before that. > > OK, change the structure then. But here's a question. You're going to change that into bool in the next patch, right? So what if bool is a byte and the field is not word-aligned and changing that byte requires a read-modify-write. How do we ensure that things remain consistent in that case? Thanks, Rafael
On Friday, September 25, 2015 01:25:49 PM Viresh Kumar wrote: > On 25 September 2015 at 13:33, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > You're going to change that into bool in the next patch, right? > > Yeah. > > > So what if bool is a byte and the field is not word-aligned > > Its between two 'unsigned long' variables today, and the struct isn't packed. > So, it will be aligned, isn't it? > > > and changing > > that byte requires a read-modify-write. How do we ensure that things remain > > consistent in that case? > > I didn't understood why a read-modify-write is special here? That's > what will happen > to most of the non-word-sized fields anyway? > > Probably I didn't understood what you meant.. Say you have three adjacent fields in a structure, x, y, z, each one byte long. Initially, all of them are equal to 0. CPU A writes 1 to x and CPU B writes 2 to y at the same time. What's the result? Thanks, Rafael
On 25-09-15, 22:58, Rafael J. Wysocki wrote: > Say you have three adjacent fields in a structure, x, y, z, each one byte long. > Initially, all of them are equal to 0. > > CPU A writes 1 to x and CPU B writes 2 to y at the same time. > > What's the result? But then two CPUs can update the same variable as well, and we must have proper locking in place to fix that. Anyway, that problem isn't here for sure as its between two unsigned-longs. So, should I just move it to bool and resend ?
On Fri, Sep 25, 2015 at 11:44 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > On 25-09-15, 22:58, Rafael J. Wysocki wrote: >> Say you have three adjacent fields in a structure, x, y, z, each one byte long. >> Initially, all of them are equal to 0. >> >> CPU A writes 1 to x and CPU B writes 2 to y at the same time. >> >> What's the result? > > But then two CPUs can update the same variable as well, and we must > have proper locking in place to fix that. Right. So if you allow something like debugfs to update your structure, how do you make sure there is the proper locking? Is that not a problem in all of the places modified by the [2/2]? How can the future users of the API know what to do to avoid potential problems with it? > > Anyway, that problem isn't here for sure as its between two > unsigned-longs. So, should I just move it to bool and resend ? I guess it might be more convenient to fold this into the other patch, because we seem to be splitting hairs here. Thanks, Rafael
On 25 September 2015 at 15:19, Rafael J. Wysocki <rafael@kernel.org> wrote: > So if you allow something like debugfs to update your structure, how > do you make sure there is the proper locking? Not really sure at all.. Isn't there some debugfs locking that will jump in, to avoid updation of fields to the same device? > Is that not a problem in all of the places modified by the [2/2]? Right, but its not new AFAICT. We already have u32 fields in those structs and on 64 bit machines we have the same read-update-write problems for two consecutive u32's. Right? > How can the future users of the API know what to do to avoid potential > problems with it? No idea really :) >> Anyway, that problem isn't here for sure as its between two >> unsigned-longs. So, should I just move it to bool and resend ? > > I guess it might be more convenient to fold this into the other patch, > because we seem to be splitting hairs here. I can and that's what I did. But then Arnd asked me to separate it out. I can fold it back if that's what you want. -- viresh
On Saturday 26 September 2015 11:40:00 Viresh Kumar wrote: > On 25 September 2015 at 15:19, Rafael J. Wysocki <rafael@kernel.org> wrote: > > So if you allow something like debugfs to update your structure, how > > do you make sure there is the proper locking? > > Not really sure at all.. Isn't there some debugfs locking that will > jump in, to avoid updation of fields to the same device? No, if you need any locking to access variable, you cannot use the simple debugfs helpers but have to provide your own functions. > >> Anyway, that problem isn't here for sure as its between two > >> unsigned-longs. So, should I just move it to bool and resend ? > > > > I guess it might be more convenient to fold this into the other patch, > > because we seem to be splitting hairs here. > > I can and that's what I did. But then Arnd asked me to separate it > out. I can fold it back if that's what you want. It still makes sense to keep it separate I think, the patch is clearly different from the other parts. Arnd
On Fri, 2015-09-25 at 22:58 +0200, Rafael J. Wysocki wrote: > On Friday, September 25, 2015 01:25:49 PM Viresh Kumar wrote: > > On 25 September 2015 at 13:33, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > You're going to change that into bool in the next patch, right? > > > > Yeah. > > > > > So what if bool is a byte and the field is not word-aligned > > > > Its between two 'unsigned long' variables today, and the struct isn't packed. > > So, it will be aligned, isn't it? > > > > > and changing > > > that byte requires a read-modify-write. How do we ensure that things remain > > > consistent in that case? > > > > I didn't understood why a read-modify-write is special here? That's > > what will happen > > to most of the non-word-sized fields anyway? > > > > Probably I didn't understood what you meant.. > > Say you have three adjacent fields in a structure, x, y, z, each one byte long. > Initially, all of them are equal to 0. > > CPU A writes 1 to x and CPU B writes 2 to y at the same time. > > What's the result? I think every CPU's cache architecure guarantees adjacent store integrity, even in the face of SMP, so it's x==1 and y==2. If you're thinking of old alpha SMP system where the lowest store width is 32 bits and thus you have to do RMW to update a byte, this was usually fixed by padding (assuming the structure is not packed). However, it was such a problem that even the later alpha chips had byte extensions. James
Dne 25. 9. 2015 18:42 napsal uživatel "Viresh Kumar" < viresh.kumar@linaro.org>: > > global_lock is defined as an unsigned long and accessing only its lower > 32 bits from sysfs is incorrect, as we need to consider other 32 bits > for big endian 64 bit systems. There are no such platforms yet, but the > code needs to be robust for such a case. > > Fix that by passing a local variable to debugfs_create_bool() and > assigning its value to global_lock later. But this has to crash whenever the file is read as val's storage is gone at that moment already, right? > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- > V3->V4: > - Create a local variable instead of changing type of global_lock > (Rafael) > - Drop the stable tag > - BCC'd a lot of people (rather than cc'ing them) to make sure > - the series reaches them > - mailing lists do not block the patchset due to long cc list > - and we don't spam the BCC'd people for every reply > --- > drivers/acpi/ec_sys.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/acpi/ec_sys.c b/drivers/acpi/ec_sys.c > index b4c216bab22b..b44b91331a56 100644 > --- a/drivers/acpi/ec_sys.c > +++ b/drivers/acpi/ec_sys.c > @@ -110,6 +110,7 @@ static int acpi_ec_add_debugfs(struct acpi_ec *ec, unsigned int ec_device_count) > struct dentry *dev_dir; > char name[64]; > umode_t mode = 0400; > + u32 val; > > if (ec_device_count == 0) { > acpi_ec_debugfs_dir = debugfs_create_dir("ec", NULL); > @@ -127,10 +128,11 @@ static int acpi_ec_add_debugfs(struct acpi_ec *ec, unsigned int ec_device_count) > > if (!debugfs_create_x32("gpe", 0444, dev_dir, (u32 *)&first_ec->gpe)) > goto error; > - if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, > - (u32 *)&first_ec->global_lock)) > + if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, &val)) > goto error; > > + first_ec->global_lock = val; > + > if (write_support) > mode = 0600; > if (!debugfs_create_file("io", mode, dev_dir, ec, &acpi_ec_io_ops)) > -- > 2.4.0 >
On Saturday, September 26, 2015 12:52:08 PM James Bottomley wrote: > On Fri, 2015-09-25 at 22:58 +0200, Rafael J. Wysocki wrote: > > On Friday, September 25, 2015 01:25:49 PM Viresh Kumar wrote: > > > On 25 September 2015 at 13:33, Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > You're going to change that into bool in the next patch, right? > > > > > > Yeah. > > > > > > > So what if bool is a byte and the field is not word-aligned > > > > > > Its between two 'unsigned long' variables today, and the struct isn't packed. > > > So, it will be aligned, isn't it? > > > > > > > and changing > > > > that byte requires a read-modify-write. How do we ensure that things remain > > > > consistent in that case? > > > > > > I didn't understood why a read-modify-write is special here? That's > > > what will happen > > > to most of the non-word-sized fields anyway? > > > > > > Probably I didn't understood what you meant.. > > > > Say you have three adjacent fields in a structure, x, y, z, each one byte long. > > Initially, all of them are equal to 0. > > > > CPU A writes 1 to x and CPU B writes 2 to y at the same time. > > > > What's the result? > > I think every CPU's cache architecure guarantees adjacent store > integrity, even in the face of SMP, so it's x==1 and y==2. If you're > thinking of old alpha SMP system where the lowest store width is 32 bits > and thus you have to do RMW to update a byte, this was usually fixed by > padding (assuming the structure is not packed). However, it was such a > problem that even the later alpha chips had byte extensions. OK, thanks! Rafael
On Saturday, September 26, 2015 09:33:56 PM Arnd Bergmann wrote: > On Saturday 26 September 2015 11:40:00 Viresh Kumar wrote: > > On 25 September 2015 at 15:19, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > So if you allow something like debugfs to update your structure, how > > > do you make sure there is the proper locking? > > > > Not really sure at all.. Isn't there some debugfs locking that will > > jump in, to avoid updation of fields to the same device? > > No, if you need any locking to access variable, you cannot use the > simple debugfs helpers but have to provide your own functions. > > > >> Anyway, that problem isn't here for sure as its between two > > >> unsigned-longs. So, should I just move it to bool and resend ? > > > > > > I guess it might be more convenient to fold this into the other patch, > > > because we seem to be splitting hairs here. > > > > I can and that's what I did. But then Arnd asked me to separate it > > out. I can fold it back if that's what you want. > > It still makes sense to keep it separate I think, the patch is clearly > different from the other parts. I just don't see much point in going from unsigned long to u32 and then from 32 to bool if we can go directly to bool in one go. Thanks, Rafael
On 26 September 2015 at 22:31, Jiri Slaby <jirislaby@gmail.com> wrote: > But this has to crash whenever the file is read as val's storage is gone at > that moment already, right? Yeah, its fixed now in the new version. This was a *really* bad idea :(
On Sunday 27 September 2015 16:10:48 Rafael J. Wysocki wrote: > On Saturday, September 26, 2015 09:33:56 PM Arnd Bergmann wrote: > > On Saturday 26 September 2015 11:40:00 Viresh Kumar wrote: > > > On 25 September 2015 at 15:19, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > So if you allow something like debugfs to update your structure, how > > > > do you make sure there is the proper locking? > > > > > > Not really sure at all.. Isn't there some debugfs locking that will > > > jump in, to avoid updation of fields to the same device? > > > > No, if you need any locking to access variable, you cannot use the > > simple debugfs helpers but have to provide your own functions. > > > > > >> Anyway, that problem isn't here for sure as its between two > > > >> unsigned-longs. So, should I just move it to bool and resend ? > > > > > > > > I guess it might be more convenient to fold this into the other patch, > > > > because we seem to be splitting hairs here. > > > > > > I can and that's what I did. But then Arnd asked me to separate it > > > out. I can fold it back if that's what you want. > > > > It still makes sense to keep it separate I think, the patch is clearly > > different from the other parts. > > I just don't see much point in going from unsigned long to u32 and then > from 32 to bool if we can go directly to bool in one go. It's only important to keep the 34-file multi-subsystem trivial cleanup that doesn't change any functionality separate from the bugfix. If you like to avoid patching one of the files twice, the alternative would be to first change the API for all other instances from u32 to bool and leave ACPI alone, and then do the second patch that changes ACPI from long to bool. Arnd
From: Rafael J. Wysocki > Sent: 27 September 2015 15:09 ... > > > Say you have three adjacent fields in a structure, x, y, z, each one byte long. > > > Initially, all of them are equal to 0. > > > > > > CPU A writes 1 to x and CPU B writes 2 to y at the same time. > > > > > > What's the result? > > > > I think every CPU's cache architecure guarantees adjacent store > > integrity, even in the face of SMP, so it's x==1 and y==2. If you're > > thinking of old alpha SMP system where the lowest store width is 32 bits > > and thus you have to do RMW to update a byte, this was usually fixed by > > padding (assuming the structure is not packed). However, it was such a > > problem that even the later alpha chips had byte extensions. Does linux still support those old Alphas? The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations. > OK, thanks! You still have to ensure the compiler doesn't do wider rmw cycles. I believe the recent versions of gcc won't do wider accesses for volatile data. David
On Monday, September 28, 2015 10:24:58 AM Arnd Bergmann wrote: > On Sunday 27 September 2015 16:10:48 Rafael J. Wysocki wrote: > > On Saturday, September 26, 2015 09:33:56 PM Arnd Bergmann wrote: > > > On Saturday 26 September 2015 11:40:00 Viresh Kumar wrote: > > > > On 25 September 2015 at 15:19, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > So if you allow something like debugfs to update your structure, how > > > > > do you make sure there is the proper locking? > > > > > > > > Not really sure at all.. Isn't there some debugfs locking that will > > > > jump in, to avoid updation of fields to the same device? > > > > > > No, if you need any locking to access variable, you cannot use the > > > simple debugfs helpers but have to provide your own functions. > > > > > > > >> Anyway, that problem isn't here for sure as its between two > > > > >> unsigned-longs. So, should I just move it to bool and resend ? > > > > > > > > > > I guess it might be more convenient to fold this into the other patch, > > > > > because we seem to be splitting hairs here. > > > > > > > > I can and that's what I did. But then Arnd asked me to separate it > > > > out. I can fold it back if that's what you want. > > > > > > It still makes sense to keep it separate I think, the patch is clearly > > > different from the other parts. > > > > I just don't see much point in going from unsigned long to u32 and then > > from 32 to bool if we can go directly to bool in one go. > > It's only important to keep the 34-file multi-subsystem trivial cleanup > that doesn't change any functionality separate from the bugfix. Which isn't a bugfix really, because the EC code is not run on any big-endian systems to my knowledge. And it won't matter after the [2/2] anyway. And the changelog of it doesn't really makes sense, because it talks about future systems, but after the [2/2] no future systems will run that code in the first place. > If you like to avoid patching one of the files twice, the alternative would > be to first change the API for all other instances from u32 to bool > and leave ACPI alone, and then do the second patch that changes ACPI > from long to bool. My point is that this patch doesn't matter. It doesn't really fix anything and the result of it goes away after the second patch. The only marginal value of having it as a separate commit is in case if (a) we need to revert the [2/2] for some reason and (b) ACPI-based ARM systems (the big-endian ones) become full-hardware at one point. You know what the chances of that are, though. :-) That said I've ACKed the patch, because I don't care that much. I'm not exactly sure why you care either. Thanks, Rafael
On Mon, 2015-09-28 at 08:58 +0000, David Laight wrote: > From: Rafael J. Wysocki > > Sent: 27 September 2015 15:09 > ... > > > > Say you have three adjacent fields in a structure, x, y, z, each one byte long. > > > > Initially, all of them are equal to 0. > > > > > > > > CPU A writes 1 to x and CPU B writes 2 to y at the same time. > > > > > > > > What's the result? > > > > > > I think every CPU's cache architecure guarantees adjacent store > > > integrity, even in the face of SMP, so it's x==1 and y==2. If you're > > > thinking of old alpha SMP system where the lowest store width is 32 bits > > > and thus you have to do RMW to update a byte, this was usually fixed by > > > padding (assuming the structure is not packed). However, it was such a > > > problem that even the later alpha chips had byte extensions. > > Does linux still support those old Alphas? > > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations. That's different: it's an atomic RMW operation. The problem with the alpha was that the operation wasn't atomic (meaning that it can't be interrupted and no intermediate output states are visible). > > OK, thanks! > > You still have to ensure the compiler doesn't do wider rmw cycles. > I believe the recent versions of gcc won't do wider accesses for volatile data. I don't understand this comment. You seem to be implying gcc would do a 64 bit RMW for a 32 bit store ... that would be daft when a single instruction exists to perform the operation on all architectures. James
From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com] > Sent: 28 September 2015 15:27 > On Mon, 2015-09-28 at 08:58 +0000, David Laight wrote: > > From: Rafael J. Wysocki > > > Sent: 27 September 2015 15:09 > > ... > > > > > Say you have three adjacent fields in a structure, x, y, z, each one byte long. > > > > > Initially, all of them are equal to 0. > > > > > > > > > > CPU A writes 1 to x and CPU B writes 2 to y at the same time. > > > > > > > > > > What's the result? > > > > > > > > I think every CPU's cache architecure guarantees adjacent store > > > > integrity, even in the face of SMP, so it's x==1 and y==2. If you're > > > > thinking of old alpha SMP system where the lowest store width is 32 bits > > > > and thus you have to do RMW to update a byte, this was usually fixed by > > > > padding (assuming the structure is not packed). However, it was such a > > > > problem that even the later alpha chips had byte extensions. > > > > Does linux still support those old Alphas? > > > > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations. > > That's different: it's an atomic RMW operation. The problem with the > alpha was that the operation wasn't atomic (meaning that it can't be > interrupted and no intermediate output states are visible). It is only atomic if prefixed by the 'lock' prefix. Normally the read and write are separate bus cycles. > > You still have to ensure the compiler doesn't do wider rmw cycles. > > I believe the recent versions of gcc won't do wider accesses for volatile data. > > I don't understand this comment. You seem to be implying gcc would do a > 64 bit RMW for a 32 bit store ... that would be daft when a single > instruction exists to perform the operation on all architectures. Read the object code and weep... It is most likely to happen for operations that are rmw (eg bit set). For instance the arm cpu has limited offsets for 16bit accesses, for normal structures the compiler is likely to use a 32bit rmw sequence for a 16bit field that has a large offset. The C language allows the compiler to do it for any access (IIRC including volatiles). David
On Mon, 2015-09-28 at 14:50 +0000, David Laight wrote: > From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com] > > Sent: 28 September 2015 15:27 > > On Mon, 2015-09-28 at 08:58 +0000, David Laight wrote: > > > From: Rafael J. Wysocki > > > > Sent: 27 September 2015 15:09 > > > ... > > > > > > Say you have three adjacent fields in a structure, x, y, z, each one byte long. > > > > > > Initially, all of them are equal to 0. > > > > > > > > > > > > CPU A writes 1 to x and CPU B writes 2 to y at the same time. > > > > > > > > > > > > What's the result? > > > > > > > > > > I think every CPU's cache architecure guarantees adjacent store > > > > > integrity, even in the face of SMP, so it's x==1 and y==2. If you're > > > > > thinking of old alpha SMP system where the lowest store width is 32 bits > > > > > and thus you have to do RMW to update a byte, this was usually fixed by > > > > > padding (assuming the structure is not packed). However, it was such a > > > > > problem that even the later alpha chips had byte extensions. > > > > > > Does linux still support those old Alphas? > > > > > > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations. > > > > That's different: it's an atomic RMW operation. The problem with the > > alpha was that the operation wasn't atomic (meaning that it can't be > > interrupted and no intermediate output states are visible). > > It is only atomic if prefixed by the 'lock' prefix. > Normally the read and write are separate bus cycles. The essential point is that x86 has atomic bit ops and byte writes. Early alpha did not. > > > You still have to ensure the compiler doesn't do wider rmw cycles. > > > I believe the recent versions of gcc won't do wider accesses for volatile data. > > > > I don't understand this comment. You seem to be implying gcc would do a > > 64 bit RMW for a 32 bit store ... that would be daft when a single > > instruction exists to perform the operation on all architectures. > > Read the object code and weep... > It is most likely to happen for operations that are rmw (eg bit set). > For instance the arm cpu has limited offsets for 16bit accesses, for > normal structures the compiler is likely to use a 32bit rmw sequence > for a 16bit field that has a large offset. > The C language allows the compiler to do it for any access (IIRC including > volatiles). I think you might be confusing different things. Most RISC CPUs can't do 32 bit store immediates because there aren't enough bits in their arsenal, so they tend to split 32 bit loads into a left and right part (first the top then the offset). This (and other things) are mostly what you see in code. However, 32 bit register stores are still atomic, which is all we require. It's not really the compiler's fault, it's mostly an architectural limitation. James
From: James Bottomley > Sent: 28 September 2015 16:12 > > > > The x86 cpus will also do 32bit wide rmw cycles for the 'bit' operations. > > > > > > That's different: it's an atomic RMW operation. The problem with the > > > alpha was that the operation wasn't atomic (meaning that it can't be > > > interrupted and no intermediate output states are visible). > > > > It is only atomic if prefixed by the 'lock' prefix. > > Normally the read and write are separate bus cycles. > > The essential point is that x86 has atomic bit ops and byte writes. > Early alpha did not. Early alpha didn't have any byte accesses. On x86 if you have the following: struct { char a; volatile char b; } *foo; foo->a |= 4; The compiler is likely to generate a 'bis #4, 0(rbx)' (or similar) and the cpu will do two 32bit memory cycles that read and write the 'volatile' field 'b'. (gcc definitely used to do this...) A lot of fields were made 32bit (and probably not bitfields) in the linux kernel tree a year or two ago to avoid this very problem. > > > > You still have to ensure the compiler doesn't do wider rmw cycles. > > > > I believe the recent versions of gcc won't do wider accesses for volatile data. > > > > > > I don't understand this comment. You seem to be implying gcc would do a > > > 64 bit RMW for a 32 bit store ... that would be daft when a single > > > instruction exists to perform the operation on all architectures. > > > > Read the object code and weep... > > It is most likely to happen for operations that are rmw (eg bit set). > > For instance the arm cpu has limited offsets for 16bit accesses, for > > normal structures the compiler is likely to use a 32bit rmw sequence > > for a 16bit field that has a large offset. > > The C language allows the compiler to do it for any access (IIRC including > > volatiles). > > I think you might be confusing different things. Most RISC CPUs can't > do 32 bit store immediates because there aren't enough bits in their > arsenal, so they tend to split 32 bit loads into a left and right part > (first the top then the offset). This (and other things) are mostly > what you see in code. However, 32 bit register stores are still atomic, > which is all we require. It's not really the compiler's fault, it's > mostly an architectural limitation. No, I'm not talking about how 32bit constants are generated. I'm talking about structure offsets. David
diff --git a/drivers/acpi/ec_sys.c b/drivers/acpi/ec_sys.c index b4c216bab22b..b44b91331a56 100644 --- a/drivers/acpi/ec_sys.c +++ b/drivers/acpi/ec_sys.c @@ -110,6 +110,7 @@ static int acpi_ec_add_debugfs(struct acpi_ec *ec, unsigned int ec_device_count) struct dentry *dev_dir; char name[64]; umode_t mode = 0400; + u32 val; if (ec_device_count == 0) { acpi_ec_debugfs_dir = debugfs_create_dir("ec", NULL); @@ -127,10 +128,11 @@ static int acpi_ec_add_debugfs(struct acpi_ec *ec, unsigned int ec_device_count) if (!debugfs_create_x32("gpe", 0444, dev_dir, (u32 *)&first_ec->gpe)) goto error; - if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, - (u32 *)&first_ec->global_lock)) + if (!debugfs_create_bool("use_global_lock", 0444, dev_dir, &val)) goto error; + first_ec->global_lock = val; + if (write_support) mode = 0600; if (!debugfs_create_file("io", mode, dev_dir, ec, &acpi_ec_io_ops))
global_lock is defined as an unsigned long and accessing only its lower 32 bits from sysfs is incorrect, as we need to consider other 32 bits for big endian 64 bit systems. There are no such platforms yet, but the code needs to be robust for such a case. Fix that by passing a local variable to debugfs_create_bool() and assigning its value to global_lock later. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- V3->V4: - Create a local variable instead of changing type of global_lock (Rafael) - Drop the stable tag - BCC'd a lot of people (rather than cc'ing them) to make sure - the series reaches them - mailing lists do not block the patchset due to long cc list - and we don't spam the BCC'd people for every reply --- drivers/acpi/ec_sys.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)