diff mbox series

[02/15] cxl/core: Check physical address before mapping it in devm_cxl_iomap_block()

Message ID 20220831081603.3415-3-rrichter@amd.com
State Superseded
Delegated to: Dan Williams
Headers show
Series cxl: Add support for Restricted CXL hosts (RCD mode) | expand

Commit Message

Robert Richter Aug. 31, 2022, 8:15 a.m. UTC
The physical base address of a CXL range can be invalid and is then
set to CXL_RESOURCE_NONE. Early check this case before mapping a
memory block in devm_cxl_iomap_block().

Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/core/regs.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jonathan Cameron Aug. 31, 2022, 8:56 a.m. UTC | #1
On Wed, 31 Aug 2022 10:15:50 +0200
Robert Richter <rrichter@amd.com> wrote:

> The physical base address of a CXL range can be invalid and is then
> set to CXL_RESOURCE_NONE. Early check this case before mapping a
> memory block in devm_cxl_iomap_block().

An example of when it is invalid would be a useful addition
to this description.  Feels to me like this is papering over the cracks
in something we should have ruled out earlier in the flow.

Jonathan

> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/regs.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 39a129c57d40..f216c017a474 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
>  	void __iomem *ret_val;
>  	struct resource *res;
>  
> +	if (addr == CXL_RESOURCE_NONE)
> +		return NULL;
> +
>  	res = devm_request_mem_region(dev, addr, length, dev_name(dev));
>  	if (!res) {
>  		resource_size_t end = addr + length - 1;
Robert Richter Sept. 1, 2022, 5:31 a.m. UTC | #2
On 31.08.22 09:56:51, Jonathan Cameron wrote:
> On Wed, 31 Aug 2022 10:15:50 +0200
> Robert Richter <rrichter@amd.com> wrote:
> 
> > The physical base address of a CXL range can be invalid and is then
> > set to CXL_RESOURCE_NONE. Early check this case before mapping a
> > memory block in devm_cxl_iomap_block().
> 
> An example of when it is invalid would be a useful addition
> to this description.  Feels to me like this is papering over the cracks
> in something we should have ruled out earlier in the flow.

IIRC this happened during code development in cxl_pci_probe() when
cxl_map_regs() is called but the base addr was not determined
correctly before. It's similar to a NULL pointer check.

-Robert
Dan Williams Sept. 8, 2022, 5:48 a.m. UTC | #3
Robert Richter wrote:
> The physical base address of a CXL range can be invalid and is then
> set to CXL_RESOURCE_NONE. Early check this case before mapping a
> memory block in devm_cxl_iomap_block().
> 
> Signed-off-by: Robert Richter <rrichter@amd.com>
> ---
>  drivers/cxl/core/regs.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 39a129c57d40..f216c017a474 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
>  	void __iomem *ret_val;
>  	struct resource *res;
>  
> +	if (addr == CXL_RESOURCE_NONE)
> +		return NULL;
> +
>  	res = devm_request_mem_region(dev, addr, length, dev_name(dev));
>  	if (!res) {
>  		resource_size_t end = addr + length - 1;
> -- 
> 2.30.2
> 

devm_request_mem_region() succeeds for you when this happens? More
details about the failure scenario please.
Robert Richter Sept. 9, 2022, 12:19 p.m. UTC | #4
On 07.09.22 22:48:57, Dan Williams wrote:
> Robert Richter wrote:
> > The physical base address of a CXL range can be invalid and is then
> > set to CXL_RESOURCE_NONE. Early check this case before mapping a
> > memory block in devm_cxl_iomap_block().
> > 
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> > ---
> >  drivers/cxl/core/regs.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > index 39a129c57d40..f216c017a474 100644
> > --- a/drivers/cxl/core/regs.c
> > +++ b/drivers/cxl/core/regs.c
> > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> >  	void __iomem *ret_val;
> >  	struct resource *res;
> >  
> > +	if (addr == CXL_RESOURCE_NONE)
> > +		return NULL;
> > +
> >  	res = devm_request_mem_region(dev, addr, length, dev_name(dev));
> >  	if (!res) {
> >  		resource_size_t end = addr + length - 1;
> > -- 
> > 2.30.2
> > 
> 
> devm_request_mem_region() succeeds for you when this happens? More
> details about the failure scenario please.

No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is
calculated that even overflows. I only vaguely remember the exact
error message.

This may happen e.g. if the Component Register Block is missing in the
DVSEC. cxl_find_regblock() may fail then and returns
CXL_RESOURCE_NONE. There are a couple of code paths there
component_reg_phys is set to CXL_RESOURCE_NONE without exiting
immediately.

I saw it during code development, when I pre-inititalized a port with
component_reg_phys set to CXL_RESOURCE_NONE. Since that case can
generally happen, I think it must be checked.

-Robert
Dan Williams Sept. 16, 2022, 6:04 p.m. UTC | #5
Robert Richter wrote:
> On 07.09.22 22:48:57, Dan Williams wrote:
> > Robert Richter wrote:
> > > The physical base address of a CXL range can be invalid and is then
> > > set to CXL_RESOURCE_NONE. Early check this case before mapping a
> > > memory block in devm_cxl_iomap_block().
> > > 
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > ---
> > >  drivers/cxl/core/regs.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > index 39a129c57d40..f216c017a474 100644
> > > --- a/drivers/cxl/core/regs.c
> > > +++ b/drivers/cxl/core/regs.c
> > > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> > >  	void __iomem *ret_val;
> > >  	struct resource *res;
> > >  
> > > +	if (addr == CXL_RESOURCE_NONE)
> > > +		return NULL;
> > > +
> > >  	res = devm_request_mem_region(dev, addr, length, dev_name(dev));
> > >  	if (!res) {
> > >  		resource_size_t end = addr + length - 1;
> > > -- 
> > > 2.30.2
> > > 
> > 
> > devm_request_mem_region() succeeds for you when this happens? More
> > details about the failure scenario please.
> 
> No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is
> calculated that even overflows. I only vaguely remember the exact
> error message.
> 
> This may happen e.g. if the Component Register Block is missing in the
> DVSEC. cxl_find_regblock() may fail then and returns
> CXL_RESOURCE_NONE. There are a couple of code paths there
> component_reg_phys is set to CXL_RESOURCE_NONE without exiting
> immediately.
> 
> I saw it during code development, when I pre-inititalized a port with
> component_reg_phys set to CXL_RESOURCE_NONE. Since that case can
> generally happen, I think it must be checked.

I think Jonathan had it right when we posited that the code should
probably have failed before getting to this point. For example, the
scenarios where the driver looks for a component register block via the
register locator DVSEC are not valid for RCDs in the first instance.
Robert Richter Sept. 28, 2022, 10:28 a.m. UTC | #6
On 16.09.22 11:04:01, Dan Williams wrote:
> Robert Richter wrote:
> > On 07.09.22 22:48:57, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > The physical base address of a CXL range can be invalid and is then
> > > > set to CXL_RESOURCE_NONE. Early check this case before mapping a
> > > > memory block in devm_cxl_iomap_block().
> > > > 
> > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > > ---
> > > >  drivers/cxl/core/regs.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > > index 39a129c57d40..f216c017a474 100644
> > > > --- a/drivers/cxl/core/regs.c
> > > > +++ b/drivers/cxl/core/regs.c
> > > > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> > > >  	void __iomem *ret_val;
> > > >  	struct resource *res;
> > > >  
> > > > +	if (addr == CXL_RESOURCE_NONE)
> > > > +		return NULL;
> > > > +
> > > >  	res = devm_request_mem_region(dev, addr, length, dev_name(dev));
> > > >  	if (!res) {
> > > >  		resource_size_t end = addr + length - 1;
> > > > -- 
> > > > 2.30.2
> > > > 
> > > 
> > > devm_request_mem_region() succeeds for you when this happens? More
> > > details about the failure scenario please.
> > 
> > No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is
> > calculated that even overflows. I only vaguely remember the exact
> > error message.
> > 
> > This may happen e.g. if the Component Register Block is missing in the
> > DVSEC. cxl_find_regblock() may fail then and returns
> > CXL_RESOURCE_NONE. There are a couple of code paths there
> > component_reg_phys is set to CXL_RESOURCE_NONE without exiting
> > immediately.
> > 
> > I saw it during code development, when I pre-inititalized a port with
> > component_reg_phys set to CXL_RESOURCE_NONE. Since that case can
> > generally happen, I think it must be checked.
> 
> I think Jonathan had it right when we posited that the code should
> probably have failed before getting to this point. For example, the
> scenarios where the driver looks for a component register block via the
> register locator DVSEC are not valid for RCDs in the first instance.

I am ok having the code reviewed to prevent such situations. But the
handling of component_reg_phys is by far not trivial as there are many
locations it is initialized and others where it is used. Call chain is
across multiple functions using various data sources for
component_reg_phys, so it is hard to proof this may never happen.
E.g. in add_port_attach_ep() I found this:

	component_reg_phys = find_component_registers(uport_dev);
	port = devm_cxl_add_port(&parent_port->dev, uport_dev,
        	component_reg_phys, parent_dport);

find_component_registers() and subsequent functions (e.g.
cxl_regmap_to_base()) may return CXL_RESOURCE_NONE. But it is written
to port without any further check in cxl_port_alloc():

	port->component_reg_phys = component_reg_phys;

It is then later directly used in devm_cxl_setup_hdm() to map io
ranges with devm_cxl_iomap_block().

Just an example, I did not go through other code paths.

IMO, the check is cheap and prevents weird follow up errors and log
messages. But I would be also ok, to just drop the patch. There are no
dependencies to this change. What do you think?

Thanks,

-Robert
Dan Williams Sept. 30, 2022, 7:07 p.m. UTC | #7
Robert Richter wrote:
> On 16.09.22 11:04:01, Dan Williams wrote:
> > Robert Richter wrote:
> > > On 07.09.22 22:48:57, Dan Williams wrote:
> > > > Robert Richter wrote:
> > > > > The physical base address of a CXL range can be invalid and is then
> > > > > set to CXL_RESOURCE_NONE. Early check this case before mapping a
> > > > > memory block in devm_cxl_iomap_block().
> > > > > 
> > > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > > > ---
> > > > >  drivers/cxl/core/regs.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> > > > > index 39a129c57d40..f216c017a474 100644
> > > > > --- a/drivers/cxl/core/regs.c
> > > > > +++ b/drivers/cxl/core/regs.c
> > > > > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> > > > >  	void __iomem *ret_val;
> > > > >  	struct resource *res;
> > > > >  
> > > > > +	if (addr == CXL_RESOURCE_NONE)
> > > > > +		return NULL;
> > > > > +
> > > > >  	res = devm_request_mem_region(dev, addr, length, dev_name(dev));
> > > > >  	if (!res) {
> > > > >  		resource_size_t end = addr + length - 1;
> > > > > -- 
> > > > > 2.30.2
> > > > > 
> > > > 
> > > > devm_request_mem_region() succeeds for you when this happens? More
> > > > details about the failure scenario please.
> > > 
> > > No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is
> > > calculated that even overflows. I only vaguely remember the exact
> > > error message.
> > > 
> > > This may happen e.g. if the Component Register Block is missing in the
> > > DVSEC. cxl_find_regblock() may fail then and returns
> > > CXL_RESOURCE_NONE. There are a couple of code paths there
> > > component_reg_phys is set to CXL_RESOURCE_NONE without exiting
> > > immediately.
> > > 
> > > I saw it during code development, when I pre-inititalized a port with
> > > component_reg_phys set to CXL_RESOURCE_NONE. Since that case can
> > > generally happen, I think it must be checked.
> > 
> > I think Jonathan had it right when we posited that the code should
> > probably have failed before getting to this point. For example, the
> > scenarios where the driver looks for a component register block via the
> > register locator DVSEC are not valid for RCDs in the first instance.
> 
> I am ok having the code reviewed to prevent such situations. But the
> handling of component_reg_phys is by far not trivial as there are many
> locations it is initialized and others where it is used. Call chain is
> across multiple functions using various data sources for
> component_reg_phys, so it is hard to proof this may never happen.
> E.g. in add_port_attach_ep() I found this:
> 
> 	component_reg_phys = find_component_registers(uport_dev);
> 	port = devm_cxl_add_port(&parent_port->dev, uport_dev,
>         	component_reg_phys, parent_dport);
> 
> find_component_registers() and subsequent functions (e.g.
> cxl_regmap_to_base()) may return CXL_RESOURCE_NONE. But it is written
> to port without any further check in cxl_port_alloc():
> 
> 	port->component_reg_phys = component_reg_phys;
> 
> It is then later directly used in devm_cxl_setup_hdm() to map io
> ranges with devm_cxl_iomap_block().
> 
> Just an example, I did not go through other code paths.
> 
> IMO, the check is cheap and prevents weird follow up errors and log
> messages. But I would be also ok, to just drop the patch. There are no
> dependencies to this change. What do you think?

Good points above...

Yeah, the expectation is that only "broken hardware" fails to provide
component registers, but the driver is currently deficient in that it
assumes everything is CXL 2.0.

Now you have me thinking it should go a bit further and do a WARN_ONCE()
when CXL_RESOURCE_NONE is passed, as those locations need better error
handling than just handling it like an ioremap() failure.
diff mbox series

Patch

diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 39a129c57d40..f216c017a474 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -165,6 +165,9 @@  void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
 	void __iomem *ret_val;
 	struct resource *res;
 
+	if (addr == CXL_RESOURCE_NONE)
+		return NULL;
+
 	res = devm_request_mem_region(dev, addr, length, dev_name(dev));
 	if (!res) {
 		resource_size_t end = addr + length - 1;