diff mbox series

[-v2] Resource: fix region_intersects() for CXL memory

Message ID 20240819023413.1109779-1-ying.huang@intel.com
State New
Headers show
Series [-v2] Resource: fix region_intersects() for CXL memory | expand

Commit Message

Huang, Ying Aug. 19, 2024, 2:34 a.m. UTC
On a system with CXL memory installed, the resource tree (/proc/iomem)
related to CXL memory looks like something as follows.

490000000-50fffffff : CXL Window 0
  490000000-50fffffff : region0
    490000000-50fffffff : dax0.0
      490000000-50fffffff : System RAM (kmem)

When the following command line is run to try writing some memory in
CXL memory range,

 $ dd if=data of=/dev/mem bs=1k seek=19136512 count=1
 dd: error writing '/dev/mem': Bad address
 1+0 records in
 0+0 records out
 0 bytes copied, 0.0283507 s, 0.0 kB/s

the command fails as expected.  However, the error code is wrong.  It
should be "Operation not permitted" instead of "Bad address".  And,
the following warning is reported in kernel log.

 ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff
 WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d
 Modules linked in: cxl_pmem libnvdimm cbc encrypted_keys cxl_pmu
 CPU: 2 UID: 0 PID: 416 Comm: dd Not tainted 6.11.0-rc3-kvm #40
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
 RIP: 0010:__ioremap_caller.constprop.0+0x131/0x35d
 Code: 2d 80 3d 24 6a a1 02 00 75 c1 48 8d 54 24 70 48 8d b4 24 90 00 00 00 48 c7 c7 40 3a 05 8a c6 05 07 6a a1 02 01 e8 a3 a0 01 00 <0f> 0b eb 9d 48 8b 84 24 90 00 00 00 48 8d 4c 24 60 89 ea 48 bf 00
 RSP: 0018:ffff888105387bf0 EFLAGS: 00010282
 RAX: 0000000000000000 RBX: 0000000490000fff RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffed1020a70f73
 RBP: 0000000000000000 R08: ffffed100d9fce92 R09: 0000000000000001
 R10: ffffffff892348e7 R11: ffffed100d9fce91 R12: 0000000490000000
 R13: 0000000000000001 R14: 0000000000000001 R15: ffff888105387ca0
 FS:  00007f86c438c740(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055ba75b1b818 CR3: 0000000005231000 CR4: 0000000000350eb0
 Call Trace:
  <TASK>
  ? __warn+0xd7/0x1b8
  ? __ioremap_caller.constprop.0+0x131/0x35d
  ? report_bug+0x136/0x19e
  ? __ioremap_caller.constprop.0+0x131/0x35d
  ? handle_bug+0x3c/0x64
  ? exc_invalid_op+0x13/0x38
  ? asm_exc_invalid_op+0x16/0x20
  ? irq_work_claim+0x16/0x38
  ? __ioremap_caller.constprop.0+0x131/0x35d
  ? tracer_hardirqs_off+0x18/0x16d
  ? kmem_cache_debug_flags+0x16/0x23
  ? memremap+0xcb/0x184
  ? iounmap+0xfe/0xfe
  ? lock_sync+0xc7/0xc7
  ? lock_sync+0xc7/0xc7
  ? rcu_is_watching+0x1c/0x38
  ? do_raw_read_unlock+0x37/0x42
  ? _raw_read_unlock+0x1f/0x2f
  memremap+0xcb/0x184
  ? pfn_valid+0x159/0x159
  ? resource_is_exclusive+0xba/0xc5
  xlate_dev_mem_ptr+0x25/0x2f
  write_mem+0x94/0xfb
  vfs_write+0x128/0x26d
  ? kernel_write+0x89/0x89
  ? rcu_is_watching+0x1c/0x38
  ? __might_fault+0x72/0xba
  ? __might_fault+0x72/0xba
  ? rcu_is_watching+0x1c/0x38
  ? lock_release+0xba/0x13e
  ? files_lookup_fd_raw+0x40/0x4b
  ? __fget_light+0x64/0x89
  ksys_write+0xac/0xfe
  ? __ia32_sys_read+0x40/0x40
  ? tracer_hardirqs_off+0x18/0x16d
  ? tracer_hardirqs_on+0x11/0x146
  do_syscall_64+0x9a/0xfd
  entry_SYSCALL_64_after_hwframe+0x4b/0x53
 RIP: 0033:0x7f86c4487140
 Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
 RSP: 002b:00007ffca9f62af8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 0000000000000400 RCX: 00007f86c4487140
 RDX: 0000000000000400 RSI: 000055ba75b1a000 RDI: 0000000000000001
 RBP: 000055ba75b1a000 R08: 0000000000000000 R09: 00007f86c457c080
 R10: 00007f86c43a84d0 R11: 0000000000000202 R12: 0000000000000000
 R13: 0000000000000000 R14: 000055ba75b1a000 R15: 0000000000000400
  </TASK>
 irq event stamp: 0
 hardirqs last  enabled at (0): [<0000000000000000>] 0x0
 hardirqs last disabled at (0): [<ffffffff89091e85>] copy_process+0xb60/0x255f
 softirqs last  enabled at (0): [<ffffffff89091e85>] copy_process+0xb60/0x255f
 softirqs last disabled at (0): [<0000000000000000>] 0x0

After investigation, we found the following bug.

In the above resource tree, "System RAM" is a descendant of "CXL
Window 0" instead of a top level resource.  So, region_intersects()
will report no System RAM resources in the CXL memory region
incorrectly, because it only checks the top level resources.
Consequently, devmem_is_allowed() will return 1 (allow access via
/dev/mem) for CXL memory region incorrectly.  Fortunately, ioremap()
doesn't allow to map System RAM and reject the access.

However, region_intersects() needs to be fixed to work correctly with
the resources tree with CXL Window as above.  To fix it, if we found a
unmatched resource in the top level, we will continue to search
matched resources in its descendant resources.  So, we will not miss
any matched resources in resource tree anymore.  In the new
implementation,

|------------- "CXL Window 0" ------------|
|-- "System RAM" --|

will look as if

|-- "System RAM" --||-- "CXL Window 0a" --|

in effect.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Baoquan He <bhe@redhat.com>
---
Changelogs:

v2:

- Fix a bug which occurs when descendant of a matched resource matches.

- Revise the patch description and comments to make the algorithm clearer.
  Thanks Dan and David's comments!

- Link to v1: https://lore.kernel.org/linux-mm/20240816020723.771196-1-ying.huang@intel.com/

 kernel/resource.c | 53 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 7 deletions(-)

Comments

Andy Shevchenko Aug. 19, 2024, 8:13 a.m. UTC | #1
On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:
> On a system with CXL memory installed, the resource tree (/proc/iomem)
> related to CXL memory looks like something as follows.
> 
> 490000000-50fffffff : CXL Window 0
>   490000000-50fffffff : region0
>     490000000-50fffffff : dax0.0
>       490000000-50fffffff : System RAM (kmem)
> 
> When the following command line is run to try writing some memory in
> CXL memory range,
> 
>  $ dd if=data of=/dev/mem bs=1k seek=19136512 count=1
>  dd: error writing '/dev/mem': Bad address
>  1+0 records in
>  0+0 records out
>  0 bytes copied, 0.0283507 s, 0.0 kB/s
> 
> the command fails as expected.  However, the error code is wrong.  It
> should be "Operation not permitted" instead of "Bad address".  And,
> the following warning is reported in kernel log.

>  ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff
>  WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d
>  Modules linked in: cxl_pmem libnvdimm cbc encrypted_keys cxl_pmu
>  CPU: 2 UID: 0 PID: 416 Comm: dd Not tainted 6.11.0-rc3-kvm #40
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:__ioremap_caller.constprop.0+0x131/0x35d
>  Code: 2d 80 3d 24 6a a1 02 00 75 c1 48 8d 54 24 70 48 8d b4 24 90 00 00 00 48 c7 c7 40 3a 05 8a c6 05 07 6a a1 02 01 e8 a3 a0 01 00 <0f> 0b eb 9d 48 8b 84 24 90 00 00 00 48 8d 4c 24 60 89 ea 48 bf 00
>  RSP: 0018:ffff888105387bf0 EFLAGS: 00010282
>  RAX: 0000000000000000 RBX: 0000000490000fff RCX: 0000000000000000
>  RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffed1020a70f73
>  RBP: 0000000000000000 R08: ffffed100d9fce92 R09: 0000000000000001
>  R10: ffffffff892348e7 R11: ffffed100d9fce91 R12: 0000000490000000
>  R13: 0000000000000001 R14: 0000000000000001 R15: ffff888105387ca0
>  FS:  00007f86c438c740(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 000055ba75b1b818 CR3: 0000000005231000 CR4: 0000000000350eb0
>  Call Trace:
>   <TASK>
>   ? __warn+0xd7/0x1b8
>   ? __ioremap_caller.constprop.0+0x131/0x35d
>   ? report_bug+0x136/0x19e
>   ? __ioremap_caller.constprop.0+0x131/0x35d
>   ? handle_bug+0x3c/0x64
>   ? exc_invalid_op+0x13/0x38
>   ? asm_exc_invalid_op+0x16/0x20
>   ? irq_work_claim+0x16/0x38
>   ? __ioremap_caller.constprop.0+0x131/0x35d
>   ? tracer_hardirqs_off+0x18/0x16d
>   ? kmem_cache_debug_flags+0x16/0x23
>   ? memremap+0xcb/0x184
>   ? iounmap+0xfe/0xfe
>   ? lock_sync+0xc7/0xc7
>   ? lock_sync+0xc7/0xc7
>   ? rcu_is_watching+0x1c/0x38
>   ? do_raw_read_unlock+0x37/0x42
>   ? _raw_read_unlock+0x1f/0x2f
>   memremap+0xcb/0x184
>   ? pfn_valid+0x159/0x159
>   ? resource_is_exclusive+0xba/0xc5
>   xlate_dev_mem_ptr+0x25/0x2f
>   write_mem+0x94/0xfb
>   vfs_write+0x128/0x26d
>   ? kernel_write+0x89/0x89
>   ? rcu_is_watching+0x1c/0x38
>   ? __might_fault+0x72/0xba
>   ? __might_fault+0x72/0xba
>   ? rcu_is_watching+0x1c/0x38
>   ? lock_release+0xba/0x13e
>   ? files_lookup_fd_raw+0x40/0x4b
>   ? __fget_light+0x64/0x89
>   ksys_write+0xac/0xfe
>   ? __ia32_sys_read+0x40/0x40
>   ? tracer_hardirqs_off+0x18/0x16d
>   ? tracer_hardirqs_on+0x11/0x146
>   do_syscall_64+0x9a/0xfd
>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
>  RIP: 0033:0x7f86c4487140
>  Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
>  RSP: 002b:00007ffca9f62af8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>  RAX: ffffffffffffffda RBX: 0000000000000400 RCX: 00007f86c4487140
>  RDX: 0000000000000400 RSI: 000055ba75b1a000 RDI: 0000000000000001
>  RBP: 000055ba75b1a000 R08: 0000000000000000 R09: 00007f86c457c080
>  R10: 00007f86c43a84d0 R11: 0000000000000202 R12: 0000000000000000
>  R13: 0000000000000000 R14: 000055ba75b1a000 R15: 0000000000000400
>   </TASK>
>  irq event stamp: 0
>  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>  hardirqs last disabled at (0): [<ffffffff89091e85>] copy_process+0xb60/0x255f
>  softirqs last  enabled at (0): [<ffffffff89091e85>] copy_process+0xb60/0x255f
>  softirqs last disabled at (0): [<0000000000000000>] 0x0

Submitting Patches documentation suggests how to shrink the above to make it
somewhat readable and important.

> After investigation, we found the following bug.
> 
> In the above resource tree, "System RAM" is a descendant of "CXL
> Window 0" instead of a top level resource.  So, region_intersects()
> will report no System RAM resources in the CXL memory region
> incorrectly, because it only checks the top level resources.
> Consequently, devmem_is_allowed() will return 1 (allow access via
> /dev/mem) for CXL memory region incorrectly.  Fortunately, ioremap()
> doesn't allow to map System RAM and reject the access.
> 
> However, region_intersects() needs to be fixed to work correctly with
> the resources tree with CXL Window as above.  To fix it, if we found a
> unmatched resource in the top level, we will continue to search
> matched resources in its descendant resources.  So, we will not miss
> any matched resources in resource tree anymore.  In the new
> implementation,
> 
> |------------- "CXL Window 0" ------------|
> |-- "System RAM" --|
> 
> will look as if
> 
> |-- "System RAM" --||-- "CXL Window 0a" --|
> 
> in effect.

> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Baoquan He <bhe@redhat.com>

You may move Cc list after '---', so it won't unnecessarily pollute the commit
message.

> ---
> Changelogs:
> 
> v2:
> 
> - Fix a bug which occurs when descendant of a matched resource matches.
> 
> - Revise the patch description and comments to make the algorithm clearer.
>   Thanks Dan and David's comments!
> 
> - Link to v1: https://lore.kernel.org/linux-mm/20240816020723.771196-1-ying.huang@intel.com/

...

>  {
>  	struct resource res;
>  	int type = 0; int other = 0;
> -	struct resource *p;
> +	struct resource *p, *dp;
> +	resource_size_t ostart, oend;
> +	bool is_type, covered;

Maybe keep more reversed xmas tree ordering?

>  	res.start = start;
>  	res.end = start + size - 1;
>  
>  	for (p = parent->child; p ; p = p->sibling) {
> -		bool is_type = (((p->flags & flags) == flags) &&
> -				((desc == IORES_DESC_NONE) ||
> -				 (desc == p->desc)));
> -
> -		if (resource_overlaps(p, &res))
> -			is_type ? type++ : other++;
> +		if (!resource_overlaps(p, &res))
> +			continue;
> +		is_type = (((p->flags & flags) == flags) &&
> +			   ((desc == IORES_DESC_NONE) || (desc == p->desc)));

In the original code and here the most outer parentheses are redundant.

> +		if (is_type) {
> +			type++;
> +			continue;
> +		}
> +		/*
> +		 * Continue to search in descendant resources as if the
> +		 * matched descendant resources cover some ranges of 'p'.
> +		 *
> +		 * |------------- "CXL Window 0" ------------|
> +		 * |-- "System RAM" --|
> +		 *
> +		 * looks as if
> +		 *
> +		 * |-- "System RAM" --||-- "CXL Window 0a" --|
> +		 *
> +		 * in effect.
> +		 */
> +		covered = false;

> +		ostart = max(res.start, p->start);
> +		oend = min(res.end, p->end);

Isn't a reinvention of resource_intersection()? With that in place you may also
drop the above resource_overlaps().

> +		for_each_resource(p, dp, false) {
> +			if (!resource_overlaps(dp, &res))
> +				continue;
> +			is_type = (((dp->flags & flags) == flags) &&
> +				   ((desc == IORES_DESC_NONE) ||
> +				    (desc == dp->desc)));

Most outer parentheses are redundant. With them being dropped you also may
unite the last two lines into a single one.

> +			if (is_type) {
> +				type++;
> +				if (dp->start > ostart)
> +					break;
> +				if (dp->end >= oend) {
> +					covered = true;
> +					break;
> +				}
> +				ostart = max(ostart, dp->end + 1);
> +			}
> +		}
> +		if (!covered)
> +			other++;
>  	}
Bjorn Helgaas Aug. 21, 2024, 6:46 p.m. UTC | #2
On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:
> On a system with CXL memory installed, the resource tree (/proc/iomem)
> related to CXL memory looks like something as follows.
> 
> 490000000-50fffffff : CXL Window 0
>   490000000-50fffffff : region0
>     490000000-50fffffff : dax0.0
>       490000000-50fffffff : System RAM (kmem)

I think the subject is too specific (the problem is something to do
with the tree topology, not the fact that it's "CXL memory") and at
the same time not specific enough ("fix" doesn't say anything about
what was wrong or how it is fixed).

IMO it could be improved by saying something about what is different
about CXL, e.g., maybe it could mention checking children in addition
to top-level resources.

> When the following command line is run to try writing some memory in
> CXL memory range,
> 
>  $ dd if=data of=/dev/mem bs=1k seek=19136512 count=1
>  dd: error writing '/dev/mem': Bad address
>  1+0 records in
>  0+0 records out
>  0 bytes copied, 0.0283507 s, 0.0 kB/s

Took me a minute, but I guess the connection is that
19136512 * 1k = 0x490000000, which is the beginning of the CXL Window.

> the command fails as expected.  However, the error code is wrong.  It
> should be "Operation not permitted" instead of "Bad address".  And,
> the following warning is reported in kernel log.

This intro makes it sound like the problem being solved is the error
code being wrong.  But it seems like a more serious problem than that.

>  ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff

Incidental: it seems a little weird that this warning only exists on
x86 and mips (and powerpc32 has a similar warning with different
wording), but I assume we don't want to ioremap RAM on *any*
architecture?

>  WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d
>  Modules linked in: cxl_pmem libnvdimm cbc encrypted_keys cxl_pmu
>  CPU: 2 UID: 0 PID: 416 Comm: dd Not tainted 6.11.0-rc3-kvm #40
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:__ioremap_caller.constprop.0+0x131/0x35d
> ...

> In the above resource tree, "System RAM" is a descendant of "CXL
> Window 0" instead of a top level resource.  So, region_intersects()
> will report no System RAM resources in the CXL memory region
> incorrectly, because it only checks the top level resources.
> Consequently, devmem_is_allowed() will return 1 (allow access via
> /dev/mem) for CXL memory region incorrectly.  Fortunately, ioremap()
> doesn't allow to map System RAM and reject the access.
> 
> However, region_intersects() needs to be fixed to work correctly with
> the resources tree with CXL Window as above.  To fix it, if we found a
> unmatched resource in the top level, we will continue to search
> matched resources in its descendant resources.  So, we will not miss
> any matched resources in resource tree anymore.  In the new
> implementation,
> 
> |------------- "CXL Window 0" ------------|
> |-- "System RAM" --|
> 
> will look as if
> 
> |-- "System RAM" --||-- "CXL Window 0a" --|

Where did "0a" come from?  The /proc/iomem above mentioned
"CXL Window 0"; is the "a" spurious?  Same question applies to the
code comment below.

> in effect.
> +		 * |------------- "CXL Window 0" ------------|
> +		 * |-- "System RAM" --|
> +		 *
> +		 * looks as if
> +		 *
> +		 * |-- "System RAM" --||-- "CXL Window 0a" --|
> +		 *
> +		 * in effect.
Dan Williams Aug. 22, 2024, 1:43 a.m. UTC | #3
Hi Bjorn,

Ying is out for the next week or so, so I will address your comments and
resubmit as I think this is a potentially urgent fix.

Bjorn Helgaas wrote:
> On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:
> > On a system with CXL memory installed, the resource tree (/proc/iomem)
> > related to CXL memory looks like something as follows.
> > 
> > 490000000-50fffffff : CXL Window 0
> >   490000000-50fffffff : region0
> >     490000000-50fffffff : dax0.0
> >       490000000-50fffffff : System RAM (kmem)
> 
> I think the subject is too specific (the problem is something to do
> with the tree topology, not the fact that it's "CXL memory") and at
> the same time not specific enough ("fix" doesn't say anything about
> what was wrong or how it is fixed).

Agree, I will update this to be:

kernel/resource: Fix region_intersects() vs add_memory_driver_managed()

> IMO it could be improved by saying something about what is different
> about CXL, e.g., maybe it could mention checking children in addition
> to top-level resources.

CXL is but one source of a resource tree topology where "System RAM" is
a descendant of some other resource. I will fix up this changelog to
make it clear that dax/kmem and add_memory_driver_managed() potentiall
confuses region_intersects() in all cases since "System RAM" is never
one of the resources passed in to add_memory_driver_managed().

> > When the following command line is run to try writing some memory in
> > CXL memory range,
> > 
> >  $ dd if=data of=/dev/mem bs=1k seek=19136512 count=1
> >  dd: error writing '/dev/mem': Bad address
> >  1+0 records in
> >  0+0 records out
> >  0 bytes copied, 0.0283507 s, 0.0 kB/s
> 
> Took me a minute, but I guess the connection is that
> 19136512 * 1k = 0x490000000, which is the beginning of the CXL Window.

Yeah, so might as well write this in a way that makes that association
clearer:

$ dd if=data of=/dev/mem bs=$((1 << 10)) seek=$((0x490000000 >> 10)) count=1

> > the command fails as expected.  However, the error code is wrong.  It
> > should be "Operation not permitted" instead of "Bad address".  And,
> > the following warning is reported in kernel log.
> 
> This intro makes it sound like the problem being solved is the error
> code being wrong.  But it seems like a more serious problem than that.

The concern was that this bug allowed System RAM protection bypass. That
does not seem to be the case on x86, but the worry is that other archs
are not saved in the same way and /dev/mem protections are impacted.

> >  ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff
> 
> Incidental: it seems a little weird that this warning only exists on
> x86 and mips (and powerpc32 has a similar warning with different
> wording), but I assume we don't want to ioremap RAM on *any*
> architecture?

Put another way, we want "System RAM" presence to always fail
devmem_is_allowed() anywhere that "System RAM" appears in the ancestry.

> >  WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d
> >  Modules linked in: cxl_pmem libnvdimm cbc encrypted_keys cxl_pmu
> >  CPU: 2 UID: 0 PID: 416 Comm: dd Not tainted 6.11.0-rc3-kvm #40
> >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> >  RIP: 0010:__ioremap_caller.constprop.0+0x131/0x35d
> > ...
> 
> > In the above resource tree, "System RAM" is a descendant of "CXL
> > Window 0" instead of a top level resource.  So, region_intersects()
> > will report no System RAM resources in the CXL memory region
> > incorrectly, because it only checks the top level resources.
> > Consequently, devmem_is_allowed() will return 1 (allow access via
> > /dev/mem) for CXL memory region incorrectly.  Fortunately, ioremap()
> > doesn't allow to map System RAM and reject the access.
> > 
> > However, region_intersects() needs to be fixed to work correctly with
> > the resources tree with CXL Window as above.  To fix it, if we found a
> > unmatched resource in the top level, we will continue to search
> > matched resources in its descendant resources.  So, we will not miss
> > any matched resources in resource tree anymore.  In the new
> > implementation,
> > 
> > |------------- "CXL Window 0" ------------|
> > |-- "System RAM" --|
> > 
> > will look as if
> > 
> > |-- "System RAM" --||-- "CXL Window 0a" --|
> 
> Where did "0a" come from?  The /proc/iomem above mentioned
> "CXL Window 0"; is the "a" spurious?  Same question applies to the
> code comment below.

Not sure where that came from, will clean up and provide a test that can
upstreammed as well.
Bjorn Helgaas Aug. 22, 2024, 9:29 p.m. UTC | #4
On Wed, Aug 21, 2024 at 06:43:43PM -0700, Dan Williams wrote:
> Hi Bjorn,
> 
> Ying is out for the next week or so, so I will address your comments and
> resubmit as I think this is a potentially urgent fix.

Sounds good, thanks.

Bjorn
Huang, Ying Aug. 30, 2024, 6:43 a.m. UTC | #5
Hi, Dan,

Dan Williams <dan.j.williams@intel.com> writes:

> Hi Bjorn,
>
> Ying is out for the next week or so, so I will address your comments and
> resubmit as I think this is a potentially urgent fix.

Just come back.  Thank you very much for your kind help!

> Bjorn Helgaas wrote:
>> On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:
>> > On a system with CXL memory installed, the resource tree (/proc/iomem)

[snip]

>> > In the above resource tree, "System RAM" is a descendant of "CXL
>> > Window 0" instead of a top level resource.  So, region_intersects()
>> > will report no System RAM resources in the CXL memory region
>> > incorrectly, because it only checks the top level resources.
>> > Consequently, devmem_is_allowed() will return 1 (allow access via
>> > /dev/mem) for CXL memory region incorrectly.  Fortunately, ioremap()
>> > doesn't allow to map System RAM and reject the access.
>> > 
>> > However, region_intersects() needs to be fixed to work correctly with
>> > the resources tree with CXL Window as above.  To fix it, if we found a
>> > unmatched resource in the top level, we will continue to search
>> > matched resources in its descendant resources.  So, we will not miss
>> > any matched resources in resource tree anymore.  In the new
>> > implementation,
>> > 
>> > |------------- "CXL Window 0" ------------|
>> > |-- "System RAM" --|
>> > 
>> > will look as if
>> > 
>> > |-- "System RAM" --||-- "CXL Window 0a" --|
>> 
>> Where did "0a" come from?  The /proc/iomem above mentioned
>> "CXL Window 0"; is the "a" spurious?  Same question applies to the
>> code comment below.
>
> Not sure where that came from, will clean up and provide a test that can
> upstreammed as well.

Sorry for confusing!  

>> > |-- "System RAM" --||-- "CXL Window 0a" --|

This isn't the real resource tree.  The real resource tree is still

>> > |------------- "CXL Window 0" ------------|
>> > |-- "System RAM" --|

While,

>> > |-- "System RAM" --||-- "CXL Window 0a" --|

is just used to demonstrate the effect of the algorithm.  That is, a
part of "CXL Window 0" is covered (or masked) by "System RAM" now.  The
remaining effective part of "CXL Window 0" is smaller, I used "CXL
Window 0a" to designate that it comes from "CXL Window 0", but it's not
exactly "CXL Window 0".

--
Best Regards,
Huang, Ying
Huang, Ying Sept. 4, 2024, 7:48 a.m. UTC | #6
Hi, Andy,

Thanks for your comments!

Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

> On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:
>> On a system with CXL memory installed, the resource tree (/proc/iomem)
>> related to CXL memory looks like something as follows.
>> 
>> 490000000-50fffffff : CXL Window 0
>>   490000000-50fffffff : region0
>>     490000000-50fffffff : dax0.0
>>       490000000-50fffffff : System RAM (kmem)
>> 
>> When the following command line is run to try writing some memory in
>> CXL memory range,
>> 
>>  $ dd if=data of=/dev/mem bs=1k seek=19136512 count=1
>>  dd: error writing '/dev/mem': Bad address
>>  1+0 records in
>>  0+0 records out
>>  0 bytes copied, 0.0283507 s, 0.0 kB/s
>> 
>> the command fails as expected.  However, the error code is wrong.  It
>> should be "Operation not permitted" instead of "Bad address".  And,
>> the following warning is reported in kernel log.
>
>>  ioremap on RAM at 0x0000000490000000 - 0x0000000490000fff
>>  WARNING: CPU: 2 PID: 416 at arch/x86/mm/ioremap.c:216 __ioremap_caller.constprop.0+0x131/0x35d
>>  Modules linked in: cxl_pmem libnvdimm cbc encrypted_keys cxl_pmu
>>  CPU: 2 UID: 0 PID: 416 Comm: dd Not tainted 6.11.0-rc3-kvm #40
>>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>>  RIP: 0010:__ioremap_caller.constprop.0+0x131/0x35d
>>  Code: 2d 80 3d 24 6a a1 02 00 75 c1 48 8d 54 24 70 48 8d b4 24 90 00 00 00 48 c7 c7 40 3a 05 8a c6 05 07 6a a1 02 01 e8 a3 a0 01 00 <0f> 0b eb 9d 48 8b 84 24 90 00 00 00 48 8d 4c 24 60 89 ea 48 bf 00
>>  RSP: 0018:ffff888105387bf0 EFLAGS: 00010282
>>  RAX: 0000000000000000 RBX: 0000000490000fff RCX: 0000000000000000
>>  RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffed1020a70f73
>>  RBP: 0000000000000000 R08: ffffed100d9fce92 R09: 0000000000000001
>>  R10: ffffffff892348e7 R11: ffffed100d9fce91 R12: 0000000490000000
>>  R13: 0000000000000001 R14: 0000000000000001 R15: ffff888105387ca0
>>  FS:  00007f86c438c740(0000) GS:ffff88806ce00000(0000) knlGS:0000000000000000
>>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  CR2: 000055ba75b1b818 CR3: 0000000005231000 CR4: 0000000000350eb0
>>  Call Trace:
>>   <TASK>
>>   ? __warn+0xd7/0x1b8
>>   ? __ioremap_caller.constprop.0+0x131/0x35d
>>   ? report_bug+0x136/0x19e
>>   ? __ioremap_caller.constprop.0+0x131/0x35d
>>   ? handle_bug+0x3c/0x64
>>   ? exc_invalid_op+0x13/0x38
>>   ? asm_exc_invalid_op+0x16/0x20
>>   ? irq_work_claim+0x16/0x38
>>   ? __ioremap_caller.constprop.0+0x131/0x35d
>>   ? tracer_hardirqs_off+0x18/0x16d
>>   ? kmem_cache_debug_flags+0x16/0x23
>>   ? memremap+0xcb/0x184
>>   ? iounmap+0xfe/0xfe
>>   ? lock_sync+0xc7/0xc7
>>   ? lock_sync+0xc7/0xc7
>>   ? rcu_is_watching+0x1c/0x38
>>   ? do_raw_read_unlock+0x37/0x42
>>   ? _raw_read_unlock+0x1f/0x2f
>>   memremap+0xcb/0x184
>>   ? pfn_valid+0x159/0x159
>>   ? resource_is_exclusive+0xba/0xc5
>>   xlate_dev_mem_ptr+0x25/0x2f
>>   write_mem+0x94/0xfb
>>   vfs_write+0x128/0x26d
>>   ? kernel_write+0x89/0x89
>>   ? rcu_is_watching+0x1c/0x38
>>   ? __might_fault+0x72/0xba
>>   ? __might_fault+0x72/0xba
>>   ? rcu_is_watching+0x1c/0x38
>>   ? lock_release+0xba/0x13e
>>   ? files_lookup_fd_raw+0x40/0x4b
>>   ? __fget_light+0x64/0x89
>>   ksys_write+0xac/0xfe
>>   ? __ia32_sys_read+0x40/0x40
>>   ? tracer_hardirqs_off+0x18/0x16d
>>   ? tracer_hardirqs_on+0x11/0x146
>>   do_syscall_64+0x9a/0xfd
>>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>  RIP: 0033:0x7f86c4487140
>>  Code: 40 00 48 8b 15 c1 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
>>  RSP: 002b:00007ffca9f62af8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>  RAX: ffffffffffffffda RBX: 0000000000000400 RCX: 00007f86c4487140
>>  RDX: 0000000000000400 RSI: 000055ba75b1a000 RDI: 0000000000000001
>>  RBP: 000055ba75b1a000 R08: 0000000000000000 R09: 00007f86c457c080
>>  R10: 00007f86c43a84d0 R11: 0000000000000202 R12: 0000000000000000
>>  R13: 0000000000000000 R14: 000055ba75b1a000 R15: 0000000000000400
>>   </TASK>
>>  irq event stamp: 0
>>  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
>>  hardirqs last disabled at (0): [<ffffffff89091e85>] copy_process+0xb60/0x255f
>>  softirqs last  enabled at (0): [<ffffffff89091e85>] copy_process+0xb60/0x255f
>>  softirqs last disabled at (0): [<0000000000000000>] 0x0
>
> Submitting Patches documentation suggests how to shrink the above to make it
> somewhat readable and important.

Thanks for the pointer.  Will do that in the next version.

>> After investigation, we found the following bug.
>> 
>> In the above resource tree, "System RAM" is a descendant of "CXL
>> Window 0" instead of a top level resource.  So, region_intersects()
>> will report no System RAM resources in the CXL memory region
>> incorrectly, because it only checks the top level resources.
>> Consequently, devmem_is_allowed() will return 1 (allow access via
>> /dev/mem) for CXL memory region incorrectly.  Fortunately, ioremap()
>> doesn't allow to map System RAM and reject the access.
>> 
>> However, region_intersects() needs to be fixed to work correctly with
>> the resources tree with CXL Window as above.  To fix it, if we found a
>> unmatched resource in the top level, we will continue to search
>> matched resources in its descendant resources.  So, we will not miss
>> any matched resources in resource tree anymore.  In the new
>> implementation,
>> 
>> |------------- "CXL Window 0" ------------|
>> |-- "System RAM" --|
>> 
>> will look as if
>> 
>> |-- "System RAM" --||-- "CXL Window 0a" --|
>> 
>> in effect.
>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Davidlohr Bueso <dave@stgolabs.net>
>> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Cc: Dave Jiang <dave.jiang@intel.com>
>> Cc: Alison Schofield <alison.schofield@intel.com>
>> Cc: Vishal Verma <vishal.l.verma@intel.com>
>> Cc: Ira Weiny <ira.weiny@intel.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Baoquan He <bhe@redhat.com>
>
> You may move Cc list after '---', so it won't unnecessarily pollute the commit
> message.

Emm... It appears that it's a common practice to include "Cc" in the
commit log.

>> ---
>> Changelogs:
>> 
>> v2:
>> 
>> - Fix a bug which occurs when descendant of a matched resource matches.
>> 
>> - Revise the patch description and comments to make the algorithm clearer.
>>   Thanks Dan and David's comments!
>> 
>> - Link to v1: https://lore.kernel.org/linux-mm/20240816020723.771196-1-ying.huang@intel.com/
>
> ...
>
>>  {
>>  	struct resource res;
>>  	int type = 0; int other = 0;
>> -	struct resource *p;
>> +	struct resource *p, *dp;
>> +	resource_size_t ostart, oend;
>> +	bool is_type, covered;
>
> Maybe keep more reversed xmas tree ordering?

OK.

>>  	res.start = start;
>>  	res.end = start + size - 1;
>>  
>>  	for (p = parent->child; p ; p = p->sibling) {
>> -		bool is_type = (((p->flags & flags) == flags) &&
>> -				((desc == IORES_DESC_NONE) ||
>> -				 (desc == p->desc)));
>> -
>> -		if (resource_overlaps(p, &res))
>> -			is_type ? type++ : other++;
>> +		if (!resource_overlaps(p, &res))
>> +			continue;
>> +		is_type = (((p->flags & flags) == flags) &&
>> +			   ((desc == IORES_DESC_NONE) || (desc == p->desc)));
>
> In the original code and here the most outer parentheses are redundant.

OK.  Will remove them.

>> +		if (is_type) {
>> +			type++;
>> +			continue;
>> +		}
>> +		/*
>> +		 * Continue to search in descendant resources as if the
>> +		 * matched descendant resources cover some ranges of 'p'.
>> +		 *
>> +		 * |------------- "CXL Window 0" ------------|
>> +		 * |-- "System RAM" --|
>> +		 *
>> +		 * looks as if
>> +		 *
>> +		 * |-- "System RAM" --||-- "CXL Window 0a" --|
>> +		 *
>> +		 * in effect.
>> +		 */
>> +		covered = false;
>
>> +		ostart = max(res.start, p->start);
>> +		oend = min(res.end, p->end);
>
> Isn't a reinvention of resource_intersection()? With that in place you may also
> drop the above resource_overlaps().

sizeof(struct resource) == 8 * sizeof(unsigned long)

Just want to avoid to define another struct resource on stack.

>> +		for_each_resource(p, dp, false) {
>> +			if (!resource_overlaps(dp, &res))
>> +				continue;
>> +			is_type = (((dp->flags & flags) == flags) &&
>> +				   ((desc == IORES_DESC_NONE) ||
>> +				    (desc == dp->desc)));
>
> Most outer parentheses are redundant. With them being dropped you also may
> unite the last two lines into a single one.

Sure.  Will do that.

>> +			if (is_type) {
>> +				type++;
>> +				if (dp->start > ostart)
>> +					break;
>> +				if (dp->end >= oend) {
>> +					covered = true;
>> +					break;
>> +				}
>> +				ostart = max(ostart, dp->end + 1);
>> +			}
>> +		}
>> +		if (!covered)
>> +			other++;
>>  	}

--
Best Regards,
Huang, Ying
Andy Shevchenko Sept. 4, 2024, 12:43 p.m. UTC | #7
On Wed, Sep 04, 2024 at 03:48:44PM +0800, Huang, Ying wrote:
> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> > On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:

...

> >> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> >> Cc: Dan Williams <dan.j.williams@intel.com>
> >> Cc: David Hildenbrand <david@redhat.com>
> >> Cc: Davidlohr Bueso <dave@stgolabs.net>
> >> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> >> Cc: Dave Jiang <dave.jiang@intel.com>
> >> Cc: Alison Schofield <alison.schofield@intel.com>
> >> Cc: Vishal Verma <vishal.l.verma@intel.com>
> >> Cc: Ira Weiny <ira.weiny@intel.com>
> >> Cc: Alistair Popple <apopple@nvidia.com>
> >> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >> Cc: Bjorn Helgaas <bhelgaas@google.com>
> >> Cc: Baoquan He <bhe@redhat.com>
> >
> > You may move Cc list after '---', so it won't unnecessarily pollute the commit
> > message.
> 
> Emm... It appears that it's a common practice to include "Cc" in the
> commit log.

For what benefit? (Note, nowadays we have lore.kernel.org which is under
the control of Linux kernel project)

Personally I see only downsides of these being inside the commit message.

Here is a discussion about this
https://lore.kernel.org/linux-doc/20240423132024.2368662-1-andriy.shevchenko@linux.intel.com/

...

> >> +		ostart = max(res.start, p->start);
> >> +		oend = min(res.end, p->end);
> >
> > Isn't a reinvention of resource_intersection()? With that in place you may also
> > drop the above resource_overlaps().
> 
> sizeof(struct resource) == 8 * sizeof(unsigned long)
> 
> Just want to avoid to define another struct resource on stack.

Is it a problem?
Dan Williams Sept. 4, 2024, 11:58 p.m. UTC | #8
Huang, Ying wrote:
> Hi, Andy,
> 
> Thanks for your comments!
> 
> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
[..]
> 
> > You may move Cc list after '---', so it won't unnecessarily pollute the commit
> > message.
> 
> Emm... It appears that it's a common practice to include "Cc" in the
> commit log.

Yes, just ignore this feedback, it goes against common practice. Cc list
as is looks sane to me.
Dan Williams Sept. 4, 2024, 11:58 p.m. UTC | #9
Bjorn Helgaas wrote:
> On Wed, Aug 21, 2024 at 06:43:43PM -0700, Dan Williams wrote:
> > Hi Bjorn,
> > 
> > Ying is out for the next week or so, so I will address your comments and
> > resubmit as I think this is a potentially urgent fix.
> 
> Sounds good, thanks.

Ying is back now and will handle the next revision.
Huang, Ying Sept. 5, 2024, 3 a.m. UTC | #10
Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

> On Wed, Sep 04, 2024 at 03:48:44PM +0800, Huang, Ying wrote:
>> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
>> > On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:
>

[snip]

>
>> >> +		ostart = max(res.start, p->start);
>> >> +		oend = min(res.end, p->end);
>> >
>> > Isn't a reinvention of resource_intersection()? With that in place you may also
>> > drop the above resource_overlaps().
>> 
>> sizeof(struct resource) == 8 * sizeof(unsigned long)
>> 
>> Just want to avoid to define another struct resource on stack.
>
> Is it a problem?

Not a serious problem.  Just prefer to avoid too much stack usage.
IMHO, the benefit isn't large too.

--
Best Regards,
Huang, Ying
Andy Shevchenko Sept. 5, 2024, 10:56 a.m. UTC | #11
On Wed, Sep 04, 2024 at 04:58:20PM -0700, Dan Williams wrote:
> Huang, Ying wrote:
> > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

[..]

> > > You may move Cc list after '---', so it won't unnecessarily pollute the commit
> > > message.
> > 
> > Emm... It appears that it's a common practice to include "Cc" in the
> > commit log.
> 
> Yes, just ignore this feedback, it goes against common practice. Cc list
> as is looks sane to me.

It seems nobody can give technical arguments why it's better than just keeping
them outside of the commit message. Mantra "common practice" nowadays is
questionable.
Andy Shevchenko Sept. 5, 2024, 10:57 a.m. UTC | #12
On Thu, Sep 05, 2024 at 11:00:14AM +0800, Huang, Ying wrote:
> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> > On Wed, Sep 04, 2024 at 03:48:44PM +0800, Huang, Ying wrote:
> >> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> >> > On Mon, Aug 19, 2024 at 10:34:13AM +0800, Huang Ying wrote:

[snip]

> >> >> +		ostart = max(res.start, p->start);
> >> >> +		oend = min(res.end, p->end);
> >> >
> >> > Isn't a reinvention of resource_intersection()? With that in place you may also
> >> > drop the above resource_overlaps().
> >> 
> >> sizeof(struct resource) == 8 * sizeof(unsigned long)
> >> 
> >> Just want to avoid to define another struct resource on stack.
> >
> > Is it a problem?
> 
> Not a serious problem.  Just prefer to avoid too much stack usage.
> IMHO, the benefit isn't large too.

Benefit is so use existing APIs and see from the code where we handle resources
(in semantic meaning).
David Hildenbrand Sept. 5, 2024, 11:08 a.m. UTC | #13
On 05.09.24 12:56, Andy Shevchenko wrote:
> On Wed, Sep 04, 2024 at 04:58:20PM -0700, Dan Williams wrote:
>> Huang, Ying wrote:
>>> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> 
> [..]
> 
>>>> You may move Cc list after '---', so it won't unnecessarily pollute the commit
>>>> message.
>>>
>>> Emm... It appears that it's a common practice to include "Cc" in the
>>> commit log.
>>
>> Yes, just ignore this feedback, it goes against common practice. Cc list
>> as is looks sane to me.
> 
> It seems nobody can give technical arguments why it's better than just keeping
> them outside of the commit message. Mantra "common practice" nowadays is
> questionable.

Just look at how patches look like in the git tree that Andrew picks up. 
(IIRC, he adds a bunch of CCs himself that are not even part of the 
original patch).

Having in the git tree who was actually involved/CCed can be quite 
valuable. More helpful than get_maintainers.pl sometimes.
Andy Shevchenko Sept. 5, 2024, 12:36 p.m. UTC | #14
On Thu, Sep 05, 2024 at 01:08:35PM +0200, David Hildenbrand wrote:
> On 05.09.24 12:56, Andy Shevchenko wrote:
> > On Wed, Sep 04, 2024 at 04:58:20PM -0700, Dan Williams wrote:
> > > Huang, Ying wrote:
> > > > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

[..]

> > > > > You may move Cc list after '---', so it won't unnecessarily pollute the commit
> > > > > message.
> > > > 
> > > > Emm... It appears that it's a common practice to include "Cc" in the
> > > > commit log.
> > > 
> > > Yes, just ignore this feedback, it goes against common practice. Cc list
> > > as is looks sane to me.
> > 
> > It seems nobody can give technical arguments why it's better than just keeping
> > them outside of the commit message. Mantra "common practice" nowadays is
> > questionable.
> 
> Just look at how patches look like in the git tree that Andrew picks up.
> (IIRC, he adds a bunch of CCs himself that are not even part of the original
> patch).

I know that and it's historical, he has a lot of the scripts that work and when
he moved to the Git it was another long story. Now you even can see how he uses
Git in his quilt approach. So, it's an exceptional and not usual workflow, hence
bad example. Try again :-)

> Having in the git tree who was actually involved/CCed can be quite valuable.
> More helpful than get_maintainers.pl sometimes.

First of all, there is no guarantee they _were_ involved. From this perspective
having Link: tag instead has much more value and supports my side of arguments.

Anything else?
David Hildenbrand Sept. 5, 2024, 12:42 p.m. UTC | #15
On 05.09.24 14:36, Andy Shevchenko wrote:
> On Thu, Sep 05, 2024 at 01:08:35PM +0200, David Hildenbrand wrote:
>> On 05.09.24 12:56, Andy Shevchenko wrote:
>>> On Wed, Sep 04, 2024 at 04:58:20PM -0700, Dan Williams wrote:
>>>> Huang, Ying wrote:
>>>>> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> 
> [..]
> 
>>>>>> You may move Cc list after '---', so it won't unnecessarily pollute the commit
>>>>>> message.
>>>>>
>>>>> Emm... It appears that it's a common practice to include "Cc" in the
>>>>> commit log.
>>>>
>>>> Yes, just ignore this feedback, it goes against common practice. Cc list
>>>> as is looks sane to me.
>>>
>>> It seems nobody can give technical arguments why it's better than just keeping
>>> them outside of the commit message. Mantra "common practice" nowadays is
>>> questionable.
>>
>> Just look at how patches look like in the git tree that Andrew picks up.
>> (IIRC, he adds a bunch of CCs himself that are not even part of the original
>> patch).
> 
> I know that and it's historical, he has a lot of the scripts that work and when
> he moved to the Git it was another long story. Now you even can see how he uses
> Git in his quilt approach. So, it's an exceptional and not usual workflow, hence
> bad example. Try again :-)

Point is, it doesn't matter what we do in this patch here if Andrew will 
unify it at all.

> 
>> Having in the git tree who was actually involved/CCed can be quite valuable.
>> More helpful than get_maintainers.pl sometimes.
> 
> First of all, there is no guarantee they _were_ involved. From this perspective
> having Link: tag instead has much more value and supports my side of arguments.

Link is certainly preferable. Usually when I fix a commit, I make sure 
to CC the people that are listed for the patch, because it at least 
should have ended up in their mailbox.

Often, it also helped to see if a buggy commit was at least CCed to the 
right persons without digging through mailing list archives.
Andy Shevchenko Sept. 5, 2024, 12:50 p.m. UTC | #16
On Thu, Sep 05, 2024 at 02:42:05PM +0200, David Hildenbrand wrote:
> On 05.09.24 14:36, Andy Shevchenko wrote:
> > On Thu, Sep 05, 2024 at 01:08:35PM +0200, David Hildenbrand wrote:
> > > On 05.09.24 12:56, Andy Shevchenko wrote:
> > > > On Wed, Sep 04, 2024 at 04:58:20PM -0700, Dan Williams wrote:
> > > > > Huang, Ying wrote:
> > > > > > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

[..]

> > > > > > > You may move Cc list after '---', so it won't unnecessarily pollute the commit
> > > > > > > message.
> > > > > > 
> > > > > > Emm... It appears that it's a common practice to include "Cc" in the
> > > > > > commit log.
> > > > > 
> > > > > Yes, just ignore this feedback, it goes against common practice. Cc list
> > > > > as is looks sane to me.
> > > > 
> > > > It seems nobody can give technical arguments why it's better than just keeping
> > > > them outside of the commit message. Mantra "common practice" nowadays is
> > > > questionable.
> > > 
> > > Just look at how patches look like in the git tree that Andrew picks up.
> > > (IIRC, he adds a bunch of CCs himself that are not even part of the original
> > > patch).
> > 
> > I know that and it's historical, he has a lot of the scripts that work and when
> > he moved to the Git it was another long story. Now you even can see how he uses
> > Git in his quilt approach. So, it's an exceptional and not usual workflow, hence
> > bad example. Try again :-)
> 
> Point is, it doesn't matter what we do in this patch here if Andrew will
> unify it at all.

Point is, that this is exceptional. And better to teach people based on better
practices, no?

> > > Having in the git tree who was actually involved/CCed can be quite valuable.
> > > More helpful than get_maintainers.pl sometimes.
> > 
> > First of all, there is no guarantee they _were_ involved. From this perspective
> > having Link: tag instead has much more value and supports my side of arguments.
> 
> Link is certainly preferable. Usually when I fix a commit, I make sure to CC
> the people that are listed for the patch, because it at least should have
> ended up in their mailbox.
> 
> Often, it also helped to see if a buggy commit was at least CCed to the
> right persons without digging through mailing list archives.

How is it better than having it in lore.kernel.org in archives where you even
see who _actually_ participated in discussion, if any?

Again, Cc neither in the Git commit, nor in the email guarantees the people
were involved. Having Cc in the commit just a big noise that pollutes it.
Especially I do not understand at all Cc: mailing-list@bla.bla.bla cases.
They are not people, they have a lot of archives besides lore.kernel.org,
only waste of resources in all means of that. I tried to summarize that in
the submitted patches to the documentation, that I referred earlier in this
thread to.
David Hildenbrand Sept. 5, 2024, 12:57 p.m. UTC | #17
On 05.09.24 14:50, Andy Shevchenko wrote:
> On Thu, Sep 05, 2024 at 02:42:05PM +0200, David Hildenbrand wrote:
>> On 05.09.24 14:36, Andy Shevchenko wrote:
>>> On Thu, Sep 05, 2024 at 01:08:35PM +0200, David Hildenbrand wrote:
>>>> On 05.09.24 12:56, Andy Shevchenko wrote:
>>>>> On Wed, Sep 04, 2024 at 04:58:20PM -0700, Dan Williams wrote:
>>>>>> Huang, Ying wrote:
>>>>>>> Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> 
> [..]
> 
>>>>>>>> You may move Cc list after '---', so it won't unnecessarily pollute the commit
>>>>>>>> message.
>>>>>>>
>>>>>>> Emm... It appears that it's a common practice to include "Cc" in the
>>>>>>> commit log.
>>>>>>
>>>>>> Yes, just ignore this feedback, it goes against common practice. Cc list
>>>>>> as is looks sane to me.
>>>>>
>>>>> It seems nobody can give technical arguments why it's better than just keeping
>>>>> them outside of the commit message. Mantra "common practice" nowadays is
>>>>> questionable.
>>>>
>>>> Just look at how patches look like in the git tree that Andrew picks up.
>>>> (IIRC, he adds a bunch of CCs himself that are not even part of the original
>>>> patch).
>>>
>>> I know that and it's historical, he has a lot of the scripts that work and when
>>> he moved to the Git it was another long story. Now you even can see how he uses
>>> Git in his quilt approach. So, it's an exceptional and not usual workflow, hence
>>> bad example. Try again :-)
>>
>> Point is, it doesn't matter what we do in this patch here if Andrew will
>> unify it at all.
> 
> Point is, that this is exceptional. And better to teach people based on better
> practices, no?

"Better" in your opinion. I don't care about a couple of Cc lines in a 
git commit. They've been useful for me, apparently not for you.

If you succeed in convincing Andrew to change it, then Andrew can fixup 
his scripts to remove all of these from the patches he sends out.

> 
>>>> Having in the git tree who was actually involved/CCed can be quite valuable.
>>>> More helpful than get_maintainers.pl sometimes.
>>>
>>> First of all, there is no guarantee they _were_ involved. From this perspective
>>> having Link: tag instead has much more value and supports my side of arguments.
>>
>> Link is certainly preferable. Usually when I fix a commit, I make sure to CC
>> the people that are listed for the patch, because it at least should have
>> ended up in their mailbox.
>>
>> Often, it also helped to see if a buggy commit was at least CCed to the
>> right persons without digging through mailing list archives.
> 
> How is it better than having it in lore.kernel.org in archives where you even
> see who _actually_ participated in discussion, if any?

You might have to dig through multiple code revisions to find that out. 
Again, I used this in the past quite successfully.

> 
> Again, Cc neither in the Git commit, nor in the email guarantees the people
> were involved. Having Cc in the commit just a big noise that pollutes it.
> Especially I do not understand at all Cc: mailing-list@bla.bla.bla cases.
> They are not people, they have a lot of archives besides lore.kernel.org,
> only waste of resources in all means of that. I tried to summarize that in
> the submitted patches to the documentation, that I referred earlier in this
> thread to.

I, for my part, never add "Cc:" to patches or cover letters that refer 
to mailing lists. I add these manually to my git-send-email "--cc" list. 
I though Andrews script also fix that up, but I might be wrong.

Anyhow, this is a discussion to be had with Andrew, not with me, so feel 
free to engage with Andrew to change is scripts to throw away all CC.
Dan Williams Sept. 5, 2024, 9:37 p.m. UTC | #18
Andy Shevchenko wrote:
> On Wed, Sep 04, 2024 at 04:58:20PM -0700, Dan Williams wrote:
> > Huang, Ying wrote:
> > > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
> 
> [..]
> 
> > > > You may move Cc list after '---', so it won't unnecessarily pollute the commit
> > > > message.
> > > 
> > > Emm... It appears that it's a common practice to include "Cc" in the
> > > commit log.
> > 
> > Yes, just ignore this feedback, it goes against common practice. Cc list
> > as is looks sane to me.
> 
> It seems nobody can give technical arguments why it's better than just keeping
> them outside of the commit message. Mantra "common practice" nowadays is
> questionable.

Yes, question asked and answered. Not to your satisfaction, but the
people that have engaged to date have been cold to the idea.
Historically, reaching into other kernel developers workflows to
instantiate a new preference is a high risk low reward endeavor, please
moderate your advocacy accordingly.
Huang, Ying Sept. 6, 2024, 1:07 a.m. UTC | #19
Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:

> On Wed, Sep 04, 2024 at 04:58:20PM -0700, Dan Williams wrote:
>> Huang, Ying wrote:
>> > Andy Shevchenko <andriy.shevchenko@linux.intel.com> writes:
>
> [..]
>
>> > > You may move Cc list after '---', so it won't unnecessarily pollute the commit
>> > > message.
>> > 
>> > Emm... It appears that it's a common practice to include "Cc" in the
>> > commit log.
>> 
>> Yes, just ignore this feedback, it goes against common practice. Cc list
>> as is looks sane to me.
>
> It seems nobody can give technical arguments why it's better than just keeping
> them outside of the commit message. Mantra "common practice" nowadays is
> questionable.

Cc list is used by 0day test robot to notify relevant developers and
maintainers in addition to the author when reporting regressions.  That
is helpful information.

--
Best Regards,
Huang, Ying
diff mbox series

Patch

diff --git a/kernel/resource.c b/kernel/resource.c
index 14777afb0a99..60492f143275 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -542,18 +542,57 @@  static int __region_intersects(struct resource *parent, resource_size_t start,
 {
 	struct resource res;
 	int type = 0; int other = 0;
-	struct resource *p;
+	struct resource *p, *dp;
+	resource_size_t ostart, oend;
+	bool is_type, covered;
 
 	res.start = start;
 	res.end = start + size - 1;
 
 	for (p = parent->child; p ; p = p->sibling) {
-		bool is_type = (((p->flags & flags) == flags) &&
-				((desc == IORES_DESC_NONE) ||
-				 (desc == p->desc)));
-
-		if (resource_overlaps(p, &res))
-			is_type ? type++ : other++;
+		if (!resource_overlaps(p, &res))
+			continue;
+		is_type = (((p->flags & flags) == flags) &&
+			   ((desc == IORES_DESC_NONE) || (desc == p->desc)));
+		if (is_type) {
+			type++;
+			continue;
+		}
+		/*
+		 * Continue to search in descendant resources as if the
+		 * matched descendant resources cover some ranges of 'p'.
+		 *
+		 * |------------- "CXL Window 0" ------------|
+		 * |-- "System RAM" --|
+		 *
+		 * looks as if
+		 *
+		 * |-- "System RAM" --||-- "CXL Window 0a" --|
+		 *
+		 * in effect.
+		 */
+		covered = false;
+		ostart = max(res.start, p->start);
+		oend = min(res.end, p->end);
+		for_each_resource(p, dp, false) {
+			if (!resource_overlaps(dp, &res))
+				continue;
+			is_type = (((dp->flags & flags) == flags) &&
+				   ((desc == IORES_DESC_NONE) ||
+				    (desc == dp->desc)));
+			if (is_type) {
+				type++;
+				if (dp->start > ostart)
+					break;
+				if (dp->end >= oend) {
+					covered = true;
+					break;
+				}
+				ostart = max(ostart, dp->end + 1);
+			}
+		}
+		if (!covered)
+			other++;
 	}
 
 	if (type == 0)