diff mbox series

cxl: question about cxl qos_class verification

Message ID 20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p8
State New, archived
Headers show
Series cxl: question about cxl qos_class verification | expand

Commit Message

Wonjae Lee Feb. 5, 2024, 10:36 a.m. UTC
Hello,

To test the CXL driver with respect to QTG IDs on a real CXL device, I
connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).

However, during cxl endpoint probing, CDAT extraction and parsing works
fine, but cxl_qos_class_verify() for cxlmd does not run properly.

To be precise, when cxl_qos_class_verify() is executed, the below error
handling code is executed since cxlmd->endpoint is NULL:

    if (!cxl_root)
        return -ENODEV;


I'm not sure if I analyzed it correctly due to the complexity of the CXL
driver, but I think it's because the cxl_port driver execute
cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
the cxl_mem driver. See the dmesg log below, where I've added debugging
code.

# cxl_mem driver is adding the endpoint
[] cxl_mem mem0: call devm_cxl_add_enpoint
...
# endpoint port is probed, and cxl_qos_class_verify() runs
[] cxl_port endpoint5: call cxl_qos_class_verify
[] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
[] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
...
# cxl_mem driver sets cxlmd->endpoint
[] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
...


I did an experiment to validate the hypothesis. If I call
cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
cxl_qos_class_verify() runs well without problems.



Maybe there's something I'm missing. It would be very helpful if anyone
could comment on the above analysis.

Thanks,
Wonjae

Comments

Dave Jiang Feb. 5, 2024, 10:31 p.m. UTC | #1
On 2/5/24 3:36 AM, Wonjae Lee wrote:
> Hello,
> 
> To test the CXL driver with respect to QTG IDs on a real CXL device, I
> connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).
> 
> However, during cxl endpoint probing, CDAT extraction and parsing works
> fine, but cxl_qos_class_verify() for cxlmd does not run properly.
> 
> To be precise, when cxl_qos_class_verify() is executed, the below error
> handling code is executed since cxlmd->endpoint is NULL:
> 
>     if (!cxl_root)
>         return -ENODEV;
> 
> 
> I'm not sure if I analyzed it correctly due to the complexity of the CXL
> driver, but I think it's because the cxl_port driver execute
> cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
> the cxl_mem driver. See the dmesg log below, where I've added debugging
> code.
> 
> # cxl_mem driver is adding the endpoint
> [] cxl_mem mem0: call devm_cxl_add_enpoint
> ...
> # endpoint port is probed, and cxl_qos_class_verify() runs
> [] cxl_port endpoint5: call cxl_qos_class_verify
> [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
> [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
> ...
> # cxl_mem driver sets cxlmd->endpoint
> [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
> ...
> 
> 
> I did an experiment to validate the hypothesis. If I call
> cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
> cxl_qos_class_verify() runs well without problems.
> 
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index c5c9d8e0d88d..33b39c6c46fd 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
>         if (rc)
>                 return rc;
> 
> +       cxl_endpoint_parse_cdat(endpoint);
> +
>         if (!endpoint->dev.driver) {
>                 dev_err(&cxlmd->dev, "%s failed probe\n",
>                         dev_name(&endpoint->dev));
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index 97c21566677a..ee77aba62780 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> 
>         /* Cache the data early to ensure is_visible() works */
>         read_cdat_data(port);
> -       cxl_endpoint_parse_cdat(port);
> 
>         get_device(&cxlmd->dev);
>         rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> 
> 
> Maybe there's something I'm missing. It would be very helpful if anyone
> could comment on the above analysis.

I think this should fix the issue you are seeing?

https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039


> 
> Thanks,
> Wonjae
Wonjae Lee Feb. 6, 2024, 1:23 a.m. UTC | #2
On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote:
>
>
> On 2/5/24 3:36 AM, Wonjae Lee wrote:
> > Hello,
> >
> > To test the CXL driver with respect to QTG IDs on a real CXL device, I
> > connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).
> >
> > However, during cxl endpoint probing, CDAT extraction and parsing works
> > fine, but cxl_qos_class_verify() for cxlmd does not run properly.
> >
> > To be precise, when cxl_qos_class_verify() is executed, the below error
> > handling code is executed since cxlmd->endpoint is NULL:
> >
> >     if (!cxl_root)
> >         return -ENODEV;
> >
> >
> > I'm not sure if I analyzed it correctly due to the complexity of the CXL
> > driver, but I think it's because the cxl_port driver execute
> > cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
> > the cxl_mem driver. See the dmesg log below, where I've added debugging
> > code.
> >
> > # cxl_mem driver is adding the endpoint
> > [] cxl_mem mem0: call devm_cxl_add_enpoint
> > ...
> > # endpoint port is probed, and cxl_qos_class_verify() runs
> > [] cxl_port endpoint5: call cxl_qos_class_verify
> > [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
> > [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
> > ...
> > # cxl_mem driver sets cxlmd->endpoint
> > [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
> > ...
> >
> >
> > I did an experiment to validate the hypothesis. If I call
> > cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
> > cxl_qos_class_verify() runs well without problems.
> >
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index c5c9d8e0d88d..33b39c6c46fd 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
> >         if (rc)
> >                 return rc;
> >
> > +       cxl_endpoint_parse_cdat(endpoint);
> > +
> >         if (!endpoint->dev.driver) {
> >                 dev_err(&cxlmd->dev, "%s failed probe\n",
> >                         dev_name(&endpoint->dev));
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index 97c21566677a..ee77aba62780 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> > @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> >
> >         /* Cache the data early to ensure is_visible() works */
> >         read_cdat_data(port);
> > -       cxl_endpoint_parse_cdat(port);
> >
> >         get_device(&cxlmd->dev);
> >         rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> >
> >
> > Maybe there's something I'm missing. It would be very helpful if anyone
> > could comment on the above analysis.
>
> I think this should fix the issue you are seeing?
>
> https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039
>

Oh, you've already fixed it. I found that on the same testbed the commit
you mentioned resolves the issue.

Thanks for your response!

>
> >
> > Thanks,
> > Wonjae
>
Dave Jiang Feb. 6, 2024, 3:24 p.m. UTC | #3
On 2/5/24 6:23 PM, Wonjae Lee wrote:
> On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote:
>>
>>
>> On 2/5/24 3:36 AM, Wonjae Lee wrote:
>>> Hello,
>>>
>>> To test the CXL driver with respect to QTG IDs on a real CXL device, I
>>> connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).
>>>
>>> However, during cxl endpoint probing, CDAT extraction and parsing works
>>> fine, but cxl_qos_class_verify() for cxlmd does not run properly.
>>>
>>> To be precise, when cxl_qos_class_verify() is executed, the below error
>>> handling code is executed since cxlmd->endpoint is NULL:
>>>
>>>     if (!cxl_root)
>>>         return -ENODEV;
>>>
>>>
>>> I'm not sure if I analyzed it correctly due to the complexity of the CXL
>>> driver, but I think it's because the cxl_port driver execute
>>> cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
>>> the cxl_mem driver. See the dmesg log below, where I've added debugging
>>> code.
>>>
>>> # cxl_mem driver is adding the endpoint
>>> [] cxl_mem mem0: call devm_cxl_add_enpoint
>>> ...
>>> # endpoint port is probed, and cxl_qos_class_verify() runs
>>> [] cxl_port endpoint5: call cxl_qos_class_verify
>>> [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
>>> [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
>>> ...
>>> # cxl_mem driver sets cxlmd->endpoint
>>> [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
>>> ...
>>>
>>>
>>> I did an experiment to validate the hypothesis. If I call
>>> cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
>>> cxl_qos_class_verify() runs well without problems.
>>>
>>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
>>> index c5c9d8e0d88d..33b39c6c46fd 100644
>>> --- a/drivers/cxl/mem.c
>>> +++ b/drivers/cxl/mem.c
>>> @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
>>>         if (rc)
>>>                 return rc;
>>>
>>> +       cxl_endpoint_parse_cdat(endpoint);
>>> +
>>>         if (!endpoint->dev.driver) {
>>>                 dev_err(&cxlmd->dev, "%s failed probe\n",
>>>                         dev_name(&endpoint->dev));
>>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
>>> index 97c21566677a..ee77aba62780 100644
>>> --- a/drivers/cxl/port.c
>>> +++ b/drivers/cxl/port.c
>>> @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
>>>
>>>         /* Cache the data early to ensure is_visible() works */
>>>         read_cdat_data(port);
>>> -       cxl_endpoint_parse_cdat(port);
>>>
>>>         get_device(&cxlmd->dev);
>>>         rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
>>>
>>>
>>> Maybe there's something I'm missing. It would be very helpful if anyone
>>> could comment on the above analysis.
>>
>> I think this should fix the issue you are seeing?
>>
>> https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039
>>
> 
> Oh, you've already fixed it. I found that on the same testbed the commit
> you mentioned resolves the issue.
> 
> Thanks for your response!

Ok if I add your Tested-by tag to that patch?
 
> 
>>
>>>
>>> Thanks,
>>> Wonjae
>>
Wonjae Lee Feb. 7, 2024, 12:19 a.m. UTC | #4
On Tue, Feb 06, 2024 at 08:24:04AM -0700, Dave Jiang wrote:
>
>
> On 2/5/24 6:23 PM, Wonjae Lee wrote:
> > On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote:
> >>
> >>
> >> On 2/5/24 3:36 AM, Wonjae Lee wrote:
> >>> Hello,
> >>>
> >>> To test the CXL driver with respect to QTG IDs on a real CXL device, I
> >>> connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3).
> >>>
> >>> However, during cxl endpoint probing, CDAT extraction and parsing works
> >>> fine, but cxl_qos_class_verify() for cxlmd does not run properly.
> >>>
> >>> To be precise, when cxl_qos_class_verify() is executed, the below error
> >>> handling code is executed since cxlmd->endpoint is NULL:
> >>>
> >>>     if (!cxl_root)
> >>>         return -ENODEV;
> >>>
> >>>
> >>> I'm not sure if I analyzed it correctly due to the complexity of the CXL
> >>> driver, but I think it's because the cxl_port driver execute
> >>> cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in
> >>> the cxl_mem driver. See the dmesg log below, where I've added debugging
> >>> code.
> >>>
> >>> # cxl_mem driver is adding the endpoint
> >>> [] cxl_mem mem0: call devm_cxl_add_enpoint
> >>> ...
> >>> # endpoint port is probed, and cxl_qos_class_verify() runs
> >>> [] cxl_port endpoint5: call cxl_qos_class_verify
> >>> [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL
> >>> [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL
> >>> ...
> >>> # cxl_mem driver sets cxlmd->endpoint
> >>> [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint
> >>> ...
> >>>
> >>>
> >>> I did an experiment to validate the hypothesis. If I call
> >>> cxl_endpoint_parse_cdat() after cxlmd->endpoint is set,
> >>> cxl_qos_class_verify() runs well without problems.
> >>>
> >>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> >>> index c5c9d8e0d88d..33b39c6c46fd 100644
> >>> --- a/drivers/cxl/mem.c
> >>> +++ b/drivers/cxl/mem.c
> >>> @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
> >>>         if (rc)
> >>>                 return rc;
> >>>
> >>> +       cxl_endpoint_parse_cdat(endpoint);
> >>> +
> >>>         if (!endpoint->dev.driver) {
> >>>                 dev_err(&cxlmd->dev, "%s failed probe\n",
> >>>                         dev_name(&endpoint->dev));
> >>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> >>> index 97c21566677a..ee77aba62780 100644
> >>> --- a/drivers/cxl/port.c
> >>> +++ b/drivers/cxl/port.c
> >>> @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> >>>
> >>>         /* Cache the data early to ensure is_visible() works */
> >>>         read_cdat_data(port);
> >>> -       cxl_endpoint_parse_cdat(port);
> >>>
> >>>         get_device(&cxlmd->dev);
> >>>         rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> >>>
> >>>
> >>> Maybe there's something I'm missing. It would be very helpful if anyone
> >>> could comment on the above analysis.
> >>
> >> I think this should fix the issue you are seeing?
> >>
> >> https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039
> >>
> >
> > Oh, you've already fixed it. I found that on the same testbed the commit
> > you mentioned resolves the issue.
> >
> > Thanks for your response!
>
> Ok if I add your Tested-by tag to that patch?

Oh, sure. Feel free to add it:

Tested-by: Wonjae Lee <wj28.lee@samsung.com>

Thanks,
Wonjae
diff mbox series

Patch

diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index c5c9d8e0d88d..33b39c6c46fd 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -74,6 +74,8 @@  static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
        if (rc)
                return rc;

+       cxl_endpoint_parse_cdat(endpoint);
+
        if (!endpoint->dev.driver) {
                dev_err(&cxlmd->dev, "%s failed probe\n",
                        dev_name(&endpoint->dev));
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 97c21566677a..ee77aba62780 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -111,7 +111,6 @@  static int cxl_endpoint_port_probe(struct cxl_port *port)

        /* Cache the data early to ensure is_visible() works */
        read_cdat_data(port);
-       cxl_endpoint_parse_cdat(port);

        get_device(&cxlmd->dev);
        rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);