mbox series

[RFC,00/12] ARM: MPAM: add support for priority partitioning control

Message ID 20230815152712.1760046-1-amitsinght@marvell.com (mailing list archive)
Headers show
Series ARM: MPAM: add support for priority partitioning control | expand

Message

Amit Singh Tomar Aug. 15, 2023, 3:27 p.m. UTC
Arm Memory System Resource Partitioning and Monitoring (MPAM) supports
different controls that can be applied to different resources in the system
For instance, an optional priority partitioning control where priority
value is generated from one MSC, propagates over interconnect to other MSC
(known as downstream priority), or can be applied within an MSC for internal
operations.

Marvell implementation of ARM MPAM supports priority partitioning control
that allows LLC MSC to generate priority values that gets propagated (along with
read/write request from upstream) to DDR Block. Within the DDR block the
priority values is mapped to different traffic class under DDR QoS strategy.
The link[1] gives some idea about DDR QoS strategy, and terms like LPR, VPR
and HPR.

Setup priority partitioning control under Resource control
----------------------------------------------------------
At present, resource control (resctrl) provides basic interface to configure/set-up
CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
ARM MPAM uses it to support controls like Cache portion partition (CPOR), and 
MPAM bandwidth partitioning.

As an example, "schemata" file under resource control group contains information about
cache portion bitmaps, and memory bandwidth allocation, and these are used to configure
Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.

MB:0=0100
L3:0=ffff

But resctrl doesn't provide a way to set-up other control that ARM MPAM provides
(For instance, Priority partitioning control as mentioned above). To support this,
James has suggested to use already existing schemata to be compatible with 
portable software, and this is the main idea behind this RFC is to have some kind
of discussion on how resctrl can be extended to support priority partitioning control.

To support Priority partitioning control, "schemata" file is updated to accommodate
priority field (upon priority partitioning capability detection), separated from CPBM
using delimiter ",".

L3:0=ffff,f where f indicates downstream priority max value.

These dspri value gets programmed per partition, that can be used to override 
QoS value coming from upstream (CPU).

RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2, and ACPI
table is based on DEN0065A_MPAM_ACPI_2.0.

Test set-up and results:
------------------------

The downstream priority value feeds into DRAM controller, and one of the important
thing that it does with this value is to service the requests sooner (based on the 
traffic class), hence reducing latency without affecting performance.

Within the DDR QoS traffic class.

0--5 ----> Low priority value
6-10 ----> Medium priority value
11-15 ----> High priority value

Benchmark[4] used is multichase.

Two partition P1 and P2:

Partition P1:
-------------
Assigned core 0
100% BW assignment

Partition P2:
-------------
Assigned cores 1-79
100% BW assignment

Test Script:
-----------
mkdir p1
cd p1
echo 1 > cpus
echo L3:1=8000,5 > schemata   ##### DSPRI set as 5 (lpr)
echo "MB:0=100" > schemata

mkdir p2
cd p2
echo ffff,ffffffff,fffffffe > cpus
echo L3:1=8000,0 > schemata
echo "MB:0=100" > schemata

### Loaded latency run, core 0 does chaseload (pointer chase) with low priority value 5, and cores 1-79 does memory bandwidth run ###
./multiload -v -n 10 -t 80 -m 1G -c chaseload  

cd /sys/fs/resctrl/p1

echo L3:1=8000,a > schemata  ##### DSPRI set as 0xa (vpr)

### Loaded latency run, core 0 does chaseload (pointer chase) with medium priority value a, and cores 1-79 does memory bandwidth run ###
./multiload -v -n 10 -t 80 -m 1G -c chaseload

cd /sys/fs/resctrl/p1

echo L3:1=8000,f > schemata  ##### DSPRI set as 0xf (hpr)

### Loaded latency run where core 0 does chaseload (pointer chase) with high priority value f, and cores 1-79 does memory bandwidth run ###
./multiload -v -n 10 -t 80 -m 1G -c chaseload

Results[5]:

LPR average latency is 204.862(ns) vs VPR average latency is 161.018(ns) vs HPR average latency is 134.210(ns).

[1]: https://drops.dagstuhl.de/opus/volltexte/2021/13934/pdf/LIPIcs-ECRTS-2021-3.pdf
[2]: https://github.com/Amit-Radur/linux/commits/mpam_downstream_priority_work
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot/v6.2
[4]: https://github.com/google/multichase
[5]:

root@localhost:# ./dspri_test.sh
Info: Loaded Latency chase selected. A -l memload can be used to select a specific memory load
nr_threads = 80
page_size = 4096 bytes
total_memory = 1073741824 (1024.0 MiB)
stride = 256
tlb_locality = 262144
chase = chaseload
memload = stream-sum
run_test_type = RUN_CHASE_LOADED
main: sample_no=0 
main: sample_no=1  avg=204.9(ns)
 main: threads=79, Total(MiB/s)=343018.0, PerThread=4342
main: sample_no=2  avg=206.0(ns)
 main: threads=79, Total(MiB/s)=343038.0, PerThread=4342
main: sample_no=3  avg=206.4(ns)
 main: threads=79, Total(MiB/s)=342443.0, PerThread=4335
main: sample_no=4  avg=206.3(ns)
 main: threads=79, Total(MiB/s)=345156.0, PerThread=4369
main: sample_no=5  avg=205.6(ns)
 main: threads=79, Total(MiB/s)=343807.0, PerThread=4352
main: sample_no=6  avg=205.9(ns)
 main: threads=79, Total(MiB/s)=343593.0, PerThread=4349
main: sample_no=7  avg=206.3(ns)
 main: threads=79, Total(MiB/s)=344770.0, PerThread=4364
main: sample_no=8  avg=205.7(ns)
 main: threads=79, Total(MiB/s)=344935.0, PerThread=4366
main: sample_no=9  avg=205.3(ns)
 main: threads=79, Total(MiB/s)=343189.0, PerThread=4344
main: sample_no=10  avg=206.1(ns)
 main: threads=79, Total(MiB/s)=344455.0, PerThread=4360
ChasAVG=205.848485, ChasGEO=205.847944, ChasBEST=204.861518, ChasWORST=206.443386, ChasDEV=0.008   
LdAvgMibs=343840.400000, LdMaxMibs=345156.000000, LdMinMibs=342443.000000, LdDevMibs=0.008   
Samples	, Byte/thd	, ChaseThds	, ChaseNS	, ChaseMibs	, ChDeviate	, LoadThds	, LdMaxMibs	, LdAvgMibs	, LdDeviate	, ChaseArg	, MemLdArg
10    	, 1073741824 	, 1       	, 204.862 	, 37      	, 0.008   	, 79      	, 345156  	, 343840  	, 0.008   	, chaseload	, stream-sum
Info: Loaded Latency chase selected. A -l memload can be used to select a specific memory load
nr_threads = 80
page_size = 4096 bytes
total_memory = 1073741824 (1024.0 MiB)
stride = 256
tlb_locality = 262144
chase = chaseload
memload = stream-sum
run_test_type = RUN_CHASE_LOADED
main: sample_no=0 
main: sample_no=1  avg=161.4(ns)
 main: threads=79, Total(MiB/s)=342023.0, PerThread=4329
main: sample_no=2  avg=161.3(ns)
 main: threads=79, Total(MiB/s)=341773.0, PerThread=4326
main: sample_no=3  avg=161.4(ns)
 main: threads=79, Total(MiB/s)=342780.0, PerThread=4339
main: sample_no=4  avg=161.6(ns)
 main: threads=79, Total(MiB/s)=341275.0, PerThread=4320
main: sample_no=5  avg=161.0(ns)
 main: threads=79, Total(MiB/s)=342680.0, PerThread=4338
main: sample_no=6  avg=161.9(ns)
 main: threads=79, Total(MiB/s)=341538.0, PerThread=4323
main: sample_no=7  avg=161.5(ns)
 main: threads=79, Total(MiB/s)=345302.0, PerThread=4371
main: sample_no=8  avg=161.5(ns)
 main: threads=79, Total(MiB/s)=341352.0, PerThread=4321
main: sample_no=9  avg=161.5(ns)
 main: threads=79, Total(MiB/s)=341200.0, PerThread=4319
main: sample_no=10  avg=161.5(ns)
 main: threads=79, Total(MiB/s)=341874.0, PerThread=4328
ChasAVG=161.458012, ChasGEO=161.457856, ChasBEST=161.017587, ChasWORST=161.935907, ChasDEV=0.006   
LdAvgMibs=342179.700000, LdMaxMibs=345302.000000, LdMinMibs=341200.000000, LdDevMibs=0.012   
Samples	, Byte/thd	, ChaseThds	, ChaseNS	, ChaseMibs	, ChDeviate	, LoadThds	, LdMaxMibs	, LdAvgMibs	, LdDeviate	, ChaseArg	, MemLdArg
10    	, 1073741824 	, 1       	, 161.018 	, 47      	, 0.006   	, 79      	, 345302  	, 342180  	, 0.012   	, chaseload	, stream-sum
Info: Loaded Latency chase selected. A -l memload can be used to select a specific memory load
nr_threads = 80
page_size = 4096 bytes
total_memory = 1073741824 (1024.0 MiB)
stride = 256
tlb_locality = 262144
chase = chaseload
memload = stream-sum
run_test_type = RUN_CHASE_LOADED
main: sample_no=0 
main: sample_no=1  avg=134.3(ns)
 main: threads=79, Total(MiB/s)=345284.0, PerThread=4371
main: sample_no=2  avg=134.7(ns)
 main: threads=79, Total(MiB/s)=345295.0, PerThread=4371
main: sample_no=3  avg=134.4(ns)
 main: threads=79, Total(MiB/s)=344421.0, PerThread=4360
main: sample_no=4  avg=134.9(ns)
 main: threads=79, Total(MiB/s)=343273.0, PerThread=4345
main: sample_no=5  avg=134.5(ns)
 main: threads=79, Total(MiB/s)=345518.0, PerThread=4374
main: sample_no=6  avg=134.5(ns)
 main: threads=79, Total(MiB/s)=346052.0, PerThread=4380
main: sample_no=7  avg=134.5(ns)
 main: threads=79, Total(MiB/s)=342852.0, PerThread=4340
main: sample_no=8  avg=134.7(ns)
 main: threads=79, Total(MiB/s)=345818.0, PerThread=4377
main: sample_no=9  avg=134.2(ns)
 main: threads=79, Total(MiB/s)=344045.0, PerThread=4355
main: sample_no=10  avg=134.7(ns)
 main: threads=79, Total(MiB/s)=344345.0, PerThread=4359
ChasAVG=134.547983, ChasGEO=134.547841, ChasBEST=134.210254, ChasWORST=134.863073, ChasDEV=0.005   
LdAvgMibs=344690.300000, LdMaxMibs=346052.000000, LdMinMibs=342852.000000, LdDevMibs=0.009   
Samples	, Byte/thd	, ChaseThds	, ChaseNS	, ChaseMibs	, ChDeviate	, LoadThds	, LdMaxMibs	, LdAvgMibs	, LdDeviate	, ChaseArg	, MemLdArg
10    	, 1073741824 	, 1       	, 134.210 	, 57      	, 0.005   	, 79      	, 346052  	, 344690  	, 0.009   	, chaseload	, stream-sum

Amit Singh Tomar (12):
  arm_mpam: Handle resource instances mapped to different controls
  arm_mpam: resctrl: Detect priority partitioning capability
  arm_mpam: resctrl: Define new schemata format for priority partition
  fs/resctrl: Obtain CPBM upon priority partition presence
  fs/resctrl: Set-up downstream priority partition resources
  fs/resctrl: Extend schemata read for priority partition control
  arm_mpam: resctrl: Retrieve priority values from arch code
  fs/resctrl: Schemata write only for intended resource
  fs/resctrl: Extend schemata write for priority partition control
  arm_mpam: resctrl: Facilitate writing downstream priority value
  arm_mpam: Fix Downstream priority mask
  arm_mpam: Program Downstream priority value

 drivers/platform/mpam/mpam_devices.c  |  38 +++++++--
 drivers/platform/mpam/mpam_internal.h |   1 +
 drivers/platform/mpam/mpam_resctrl.c  |  64 +++++++++++---
 fs/resctrl/ctrlmondata.c              | 118 ++++++++++++++++++++++++--
 fs/resctrl/rdtgroup.c                 |  30 +++++++
 include/linux/resctrl.h               |  12 +++
 6 files changed, 235 insertions(+), 28 deletions(-)

Comments

Reinette Chatre Aug. 17, 2023, 7:11 p.m. UTC | #1
(+Tony)

Hi Amit,

On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
> Arm Memory System Resource Partitioning and Monitoring (MPAM) supports
> different controls that can be applied to different resources in the system
> For instance, an optional priority partitioning control where priority
> value is generated from one MSC, propagates over interconnect to other MSC
> (known as downstream priority), or can be applied within an MSC for internal
> operations.
> 
> Marvell implementation of ARM MPAM supports priority partitioning control
> that allows LLC MSC to generate priority values that gets propagated (along with
> read/write request from upstream) to DDR Block. Within the DDR block the
> priority values is mapped to different traffic class under DDR QoS strategy.
> The link[1] gives some idea about DDR QoS strategy, and terms like LPR, VPR
> and HPR.
> 
> Setup priority partitioning control under Resource control
> ----------------------------------------------------------
> At present, resource control (resctrl) provides basic interface to configure/set-up
> CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
> ARM MPAM uses it to support controls like Cache portion partition (CPOR), and 
> MPAM bandwidth partitioning.
> 
> As an example, "schemata" file under resource control group contains information about
> cache portion bitmaps, and memory bandwidth allocation, and these are used to configure
> Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
> 
> MB:0=0100
> L3:0=ffff
> 
> But resctrl doesn't provide a way to set-up other control that ARM MPAM provides
> (For instance, Priority partitioning control as mentioned above). To support this,
> James has suggested to use already existing schemata to be compatible with 
> portable software, and this is the main idea behind this RFC is to have some kind
> of discussion on how resctrl can be extended to support priority partitioning control.
> 
> To support Priority partitioning control, "schemata" file is updated to accommodate
> priority field (upon priority partitioning capability detection), separated from CPBM
> using delimiter ",".
> 
> L3:0=ffff,f where f indicates downstream priority max value.
> 
> These dspri value gets programmed per partition, that can be used to override 
> QoS value coming from upstream (CPU).
> 
> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2, and ACPI
> table is based on DEN0065A_MPAM_ACPI_2.0.
>

There are some aspects of this that I think we should be cautious about. First,
there may inevitably be more properties in the future that need to be associated with
a resource allocation, these may indeed be different between architectures
and individual platforms. Second, user space need a way to know which properties
are supported and what valid parameters may be. 

On a high level I thus understand the goal be to add support for assigning a
property to a resource allocation with "Priority partitioning control" being
the first property.

To that end, I have a few questions:
* How can this interface be expanded to support more properties with the
  expectation that a system/architecture may not support all resctrl supported
  properties?
* Is it possible for support for properties to vary between, for example, different
  MSCs in the system? From resctrl side it may mean that there would be a resource,
  for example "L3", with multiple instances, for example, cache with id #0, cache
  with id#1, etc. but the supported properties or valid values of properties
  may vary between the instances?
* How can user space know that a system supports "Priority partitioning control"?
  User space needs to know when/if it can attempt to write a priority to the
  schemata.
* How can user space know what priority values are valid for a particular system?

> Test set-up and results:
> ------------------------
> 
> The downstream priority value feeds into DRAM controller, and one of the important
> thing that it does with this value is to service the requests sooner (based on the 
> traffic class), hence reducing latency without affecting performance.

Could you please elaborate here? I expected reduced latency to have a big impact
on performance.

> 
> Within the DDR QoS traffic class.
> 
> 0--5 ----> Low priority value
> 6-10 ----> Medium priority value
> 11-15 ----> High priority value
> 
> Benchmark[4] used is multichase.
> 
> Two partition P1 and P2:
> 
> Partition P1:
> -------------
> Assigned core 0
> 100% BW assignment
> 
> Partition P2:
> -------------
> Assigned cores 1-79
> 100% BW assignment
> 
> Test Script:
> -----------
> mkdir p1
> cd p1
> echo 1 > cpus
> echo L3:1=8000,5 > schemata   ##### DSPRI set as 5 (lpr)
> echo "MB:0=100" > schemata
> 
> mkdir p2
> cd p2
> echo ffff,ffffffff,fffffffe > cpus
> echo L3:1=8000,0 > schemata
> echo "MB:0=100" > schemata
> 
> ### Loaded latency run, core 0 does chaseload (pointer chase) with low priority value 5, and cores 1-79 does memory bandwidth run ###

Could you please elaborate what is meant with a "memory bandwidth run"?

> ./multiload -v -n 10 -t 80 -m 1G -c chaseload  
> 
> cd /sys/fs/resctrl/p1
> 
> echo L3:1=8000,a > schemata  ##### DSPRI set as 0xa (vpr)
> 
> ### Loaded latency run, core 0 does chaseload (pointer chase) with medium priority value a, and cores 1-79 does memory bandwidth run ###
> ./multiload -v -n 10 -t 80 -m 1G -c chaseload
> 
> cd /sys/fs/resctrl/p1
> 
> echo L3:1=8000,f > schemata  ##### DSPRI set as 0xf (hpr)
> 
> ### Loaded latency run where core 0 does chaseload (pointer chase) with high priority value f, and cores 1-79 does memory bandwidth run ###
> ./multiload -v -n 10 -t 80 -m 1G -c chaseload
> 
> Results[5]:
> 
> LPR average latency is 204.862(ns) vs VPR average latency is 161.018(ns) vs HPR average latency is 134.210(ns).

Reinette
Reinette Chatre Aug. 17, 2023, 8:29 p.m. UTC | #2
Hi Amit,

On 8/17/2023 12:11 PM, Reinette Chatre wrote:
> On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:

>>
>> Within the DDR QoS traffic class.
>>
>> 0--5 ----> Low priority value
>> 6-10 ----> Medium priority value
>> 11-15 ----> High priority value
>>
>> Benchmark[4] used is multichase.
>>
>> Two partition P1 and P2:
>>
>> Partition P1:
>> -------------
>> Assigned core 0
>> 100% BW assignment
>>
>> Partition P2:
>> -------------
>> Assigned cores 1-79
>> 100% BW assignment
>>
>> Test Script:
>> -----------
>> mkdir p1
>> cd p1
>> echo 1 > cpus
>> echo L3:1=8000,5 > schemata   ##### DSPRI set as 5 (lpr)
>> echo "MB:0=100" > schemata

I peeked at the next commit and I am missing something. 

It looks like indeed resource instances need to
support different controls, so that seems to answer my earlier
question. How to let user know what is supported where
remains an open, now with understanding that the information
is required to be per resource instance.

The first commit mentions that #0 has the Priority
partitioning feature but in these examples the schemata
of #1 is updated to modify the priority. Also, if I
understand correctly CPOR and priority partitioning
are mutually exclusive so I find it confusing to
see a bitmap and a priority written to a single resource.


Reinette
Peter Newman Aug. 22, 2023, 9:01 a.m. UTC | #3
Hi Amit,

On Tue, Aug 15, 2023 at 5:27 PM Amit Singh Tomar <amitsinght@marvell.com> wrote:
> As an example, "schemata" file under resource control group contains information about
> cache portion bitmaps, and memory bandwidth allocation, and these are used to configure
> Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
>
> MB:0=0100
> L3:0=ffff
>
> But resctrl doesn't provide a way to set-up other control that ARM MPAM provides
> (For instance, Priority partitioning control as mentioned above). To support this,
> James has suggested to use already existing schemata to be compatible with
> portable software, and this is the main idea behind this RFC is to have some kind
> of discussion on how resctrl can be extended to support priority partitioning control.
>
> To support Priority partitioning control, "schemata" file is updated to accommodate
> priority field (upon priority partitioning capability detection), separated from CPBM
> using delimiter ",".
>
> L3:0=ffff,f where f indicates downstream priority max value.

Do we really have to mash two controls into the same schema? In the
CDP example, the code/data controls are presented as multiple schema,
for example: "L3CODE, L3DATA"

Especially when reading back the schemata file, it seems like it would
be simpler for existing software to ignore unfamiliar schema lines in
the schemata file than to overlook the introduction of a comma to the
CBM in the existing "L3" schema.

Thanks!
-Peter
Amit Singh Tomar Aug. 22, 2023, 12:44 p.m. UTC | #4
Hi Reinette,

Thanks for having a look!

-----Original Message-----
From: Reinette Chatre <reinette.chatre@intel.com> 
Sent: Friday, August 18, 2023 12:41 AM
To: Amit Singh Tomar <amitsinght@marvell.com>; linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, Tony <tony.luck@intel.com>
Subject: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control

External Email

----------------------------------------------------------------------
(+Tony)

Hi Amit,

On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
> Arm Memory System Resource Partitioning and Monitoring (MPAM) supports 
> different controls that can be applied to different resources in the 
> system For instance, an optional priority partitioning control where 
> priority value is generated from one MSC, propagates over interconnect 
> to other MSC (known as downstream priority), or can be applied within 
> an MSC for internal operations.
> 
> Marvell implementation of ARM MPAM supports priority partitioning 
> control that allows LLC MSC to generate priority values that gets 
> propagated (along with read/write request from upstream) to DDR Block. 
> Within the DDR block the priority values is mapped to different traffic class under DDR QoS strategy.
> The link[1] gives some idea about DDR QoS strategy, and terms like 
> LPR, VPR and HPR.
> 
> Setup priority partitioning control under Resource control
> ----------------------------------------------------------
> At present, resource control (resctrl) provides basic interface to 
> configure/set-up CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
> ARM MPAM uses it to support controls like Cache portion partition 
> (CPOR), and MPAM bandwidth partitioning.
> 
> As an example, "schemata" file under resource control group contains 
> information about cache portion bitmaps, and memory bandwidth 
> allocation, and these are used to configure Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
> 
> MB:0=0100
> L3:0=ffff
> 
> But resctrl doesn't provide a way to set-up other control that ARM 
> MPAM provides (For instance, Priority partitioning control as 
> mentioned above). To support this, James has suggested to use already 
> existing schemata to be compatible with portable software, and this is 
> the main idea behind this RFC is to have some kind of discussion on how resctrl can be extended to support priority partitioning control.
> 
> To support Priority partitioning control, "schemata" file is updated 
> to accommodate priority field (upon priority partitioning capability 
> detection), separated from CPBM using delimiter ",".
> 
> L3:0=ffff,f where f indicates downstream priority max value.
> 
> These dspri value gets programmed per partition, that can be used to 
> override QoS value coming from upstream (CPU).
> 
> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2, 
> and ACPI table is based on DEN0065A_MPAM_ACPI_2.0.
>

There are some aspects of this that I think we should be cautious about. First, there may inevitably be more properties in the future that need to be associated with a resource allocation, these may indeed be different between architectures and individual platforms. Second, user space need a way to know which properties are supported and what valid parameters may be. 

On a high level I thus understand the goal be to add support for assigning a property to a resource allocation with "Priority partitioning control" being the first property.

To that end, I have a few questions:
* How can this interface be expanded to support more properties with the
  expectation that a system/architecture may not support all resctrl supported
  properties?
[>>] All these new controls ("Priority partitioning is one of them) detected as resource capabilities (via Features Identification Register), and these control will not be probed, if system/architecture
        doesn't support it. From resource control side, this means that users will never get to know about the controls from schemata file. For instance, the platform that supports Priority partitioning control
        schemata file looks like:

       # cat schemata 
           L3:1=ffff

        As oppose to when system has Priority partitioning control
        # cat schemata 
           L3:1=ffff,f
      
        
* Is it possible for support for properties to vary between, for example, different
  MSCs in the system? From resctrl side it may mean that there would be a resource,
  for example "L3", with multiple instances, for example, cache with id #0, cache
  with id#1, etc. but the supported properties or valid values of properties
  may vary between the instances?
[>>] This is really implementation dependent but we would expect, if multiple L3 instances
        across multiple dies implements this control, it should be uniform across, but let's take a case
        where L3 MSC instance on one socket has this control, and other L3 MSC instance on another 
        socket doesn't have this control. From resctrl perspective, one would see this control
        only for L3 instance that has this control, and programmed only for that L3 instance.

       L3:0=XXXX,X;L3:1=XXXX

       And as per proposed format:
   
       L3:0=XXXX,PPART=X, L3:1=XXXX
       
* How can user space know that a system supports "Priority partitioning control"?
  User space needs to know when/if it can attempt to write a priority to the
  schemata.
[>>] At the moment, we label only the resource class, and would like to propose we should
        label newly added controls (under a resource class) as well so that user can easily identify 
        which control to program. For instance, the schemata file with this proposed changes
        will look like this:
        
        L3:0=XXXX,PPART=X

       where PPART=Priority partitioning control, Similarly, if L3 resource class has one more capability, say cache capacity partitioning.

       L3:0=XXXX,PPART=X,CCAP=X

      Very first control always be CAT/CPOR (with no labels)
      

        
* How can user space know what priority values are valid for a particular system?
[>>] Supported priority values are read from one of the MPAM Priority Partitioning register, and in the
        Schemata file, it is set to Maximum value just like Cache portion bitmaps or Memory bandwidth allocation.
        For instance:
   
        L3:0=ffff,f, max priority values is f, and user can program/set from 0-15
      

> Test set-up and results:
> ------------------------
> 
> The downstream priority value feeds into DRAM controller, and one of 
> the important thing that it does with this value is to service the 
> requests sooner (based on the traffic class), hence reducing latency without affecting performance.

Could you please elaborate here? I expected reduced latency to have a big impact on performance.
[>>] To be clear, by performance, it meant Memory bandwidth, and with this  specific configuration/test
       We see priority partitioning as a utility to guarantee lower latency. We are yet to explore its affect
       On memory bandwidth side.
       
> 
> Within the DDR QoS traffic class.
> 
> 0--5 ----> Low priority value
> 6-10 ----> Medium priority value
> 11-15 ----> High priority value
> 
> Benchmark[4] used is multichase.
> 
> Two partition P1 and P2:
> 
> Partition P1:
> -------------
> Assigned core 0
> 100% BW assignment
> 
> Partition P2:
> -------------
> Assigned cores 1-79
> 100% BW assignment
> 
> Test Script:
> -----------
> mkdir p1
> cd p1
> echo 1 > cpus
> echo L3:1=8000,5 > schemata   ##### DSPRI set as 5 (lpr)
> echo "MB:0=100" > schemata
> 
> mkdir p2
> cd p2
> echo ffff,ffffffff,fffffffe > cpus
> echo L3:1=8000,0 > schemata
> echo "MB:0=100" > schemata
> 
> ### Loaded latency run, core 0 does chaseload (pointer chase) with low 
> priority value 5, and cores 1-79 does memory bandwidth run ###

Could you please elaborate what is meant with a "memory bandwidth run"?
[>>] By memory bandwidth run, it meant memory bandwidth test that measure data transfer rate between CPU cores , and Main memory (The 1G size we choose, make sure that it hits DDR , and not     constrained to Caches).

> ./multiload -v -n 10 -t 80 -m 1G -c chaseload
> 
> cd /sys/fs/resctrl/p1
> 
> echo L3:1=8000,a > schemata  ##### DSPRI set as 0xa (vpr)
> 
> ### Loaded latency run, core 0 does chaseload (pointer chase) with 
> medium priority value a, and cores 1-79 does memory bandwidth run ### 
> ./multiload -v -n 10 -t 80 -m 1G -c chaseload
> 
> cd /sys/fs/resctrl/p1
> 
> echo L3:1=8000,f > schemata  ##### DSPRI set as 0xf (hpr)
> 
> ### Loaded latency run where core 0 does chaseload (pointer chase) 
> with high priority value f, and cores 1-79 does memory bandwidth run 
> ### ./multiload -v -n 10 -t 80 -m 1G -c chaseload
> 
> Results[5]:
> 
> LPR average latency is 204.862(ns) vs VPR average latency is 161.018(ns) vs HPR average latency is 134.210(ns).

Reinette
Reinette Chatre Aug. 23, 2023, 7:06 p.m. UTC | #5
Hi Amit,

On 8/22/2023 5:44 AM, Amit Singh Tomar wrote:
> Hi Reinette,
> 
> Thanks for having a look!
> 
> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com> 
> Sent: Friday, August 18, 2023 12:41 AM
> To: Amit Singh Tomar <amitsinght@marvell.com>; linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, Tony <tony.luck@intel.com>
> Subject: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control
> 
> External Email
> 
> ----------------------------------------------------------------------
> (+Tony)
> 
> Hi Amit,
> 
> On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
>> Arm Memory System Resource Partitioning and Monitoring (MPAM) supports 
>> different controls that can be applied to different resources in the 
>> system For instance, an optional priority partitioning control where 
>> priority value is generated from one MSC, propagates over interconnect 
>> to other MSC (known as downstream priority), or can be applied within 
>> an MSC for internal operations.
>>
>> Marvell implementation of ARM MPAM supports priority partitioning 
>> control that allows LLC MSC to generate priority values that gets 
>> propagated (along with read/write request from upstream) to DDR Block. 
>> Within the DDR block the priority values is mapped to different traffic class under DDR QoS strategy.
>> The link[1] gives some idea about DDR QoS strategy, and terms like 
>> LPR, VPR and HPR.
>>
>> Setup priority partitioning control under Resource control
>> ----------------------------------------------------------
>> At present, resource control (resctrl) provides basic interface to 
>> configure/set-up CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
>> ARM MPAM uses it to support controls like Cache portion partition 
>> (CPOR), and MPAM bandwidth partitioning.
>>
>> As an example, "schemata" file under resource control group contains 
>> information about cache portion bitmaps, and memory bandwidth 
>> allocation, and these are used to configure Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
>>
>> MB:0=0100
>> L3:0=ffff
>>
>> But resctrl doesn't provide a way to set-up other control that ARM 
>> MPAM provides (For instance, Priority partitioning control as 
>> mentioned above). To support this, James has suggested to use already 
>> existing schemata to be compatible with portable software, and this is 
>> the main idea behind this RFC is to have some kind of discussion on how resctrl can be extended to support priority partitioning control.
>>
>> To support Priority partitioning control, "schemata" file is updated 
>> to accommodate priority field (upon priority partitioning capability 
>> detection), separated from CPBM using delimiter ",".
>>
>> L3:0=ffff,f where f indicates downstream priority max value.
>>
>> These dspri value gets programmed per partition, that can be used to 
>> override QoS value coming from upstream (CPU).
>>
>> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2, 
>> and ACPI table is based on DEN0065A_MPAM_ACPI_2.0.
>>
> 
> There are some aspects of this that I think we should be cautious
> about. First, there may inevitably be more properties in the future
> that need to be associated with a resource allocation, these may
> indeed be different between architectures and individual platforms.
> Second, user space need a way to know which properties are supported
> and what valid parameters may be. 
> 
> On a high level I thus understand the goal be to add support for
> assigning a property to a resource allocation with "Priority
> partitioning control" being the first property.

> To that end, I have a few questions:
> * How can this interface be expanded to support more properties with the
>   expectation that a system/architecture may not support all resctrl supported
>   properties?
> [>>] All these new controls ("Priority partitioning is one of them) detected as resource capabilities (via Features Identification Register), and these control will not be probed, if system/architecture
>         doesn't support it. From resource control side, this means that users will never get to know about the controls from schemata file. For instance, the platform that supports Priority partitioning control
>         schemata file looks like:
> 
>        # cat schemata 
>            L3:1=ffff
> 
>         As oppose to when system has Priority partitioning control
>         # cat schemata 
>            L3:1=ffff,f
>

Right, but my question is "How can this interface be expanded ...".
Consider a future L3 resource that has a new and different property
("new_property") that is independent from "Priority partitioning". 
If "L3:1=ffff,f" means "Priority partitioning" == 0xf, how can
a value be assigned to "new_property" if the system's L3 supports
it but not "Priority partitioning"?
If I understand correctly the proposed interface is a positional
interface and "Priority partitioning" is always in second field ...
but a system may or may not support this property so does it require
an empty second field to be able to use other properties?

(fyi ... the quoting used in your response does not make it
obvious what you are responding to)

     
>         
> * Is it possible for support for properties to vary between, for example, different
>   MSCs in the system? From resctrl side it may mean that there would be a resource,
>   for example "L3", with multiple instances, for example, cache with id #0, cache
>   with id#1, etc. but the supported properties or valid values of properties
>   may vary between the instances?
> [>>] This is really implementation dependent but we would expect, if multiple L3 instances
>         across multiple dies implements this control, it should be uniform across, but let's take a case
>         where L3 MSC instance on one socket has this control, and other L3 MSC instance on another 
>         socket doesn't have this control. From resctrl perspective, one would see this control
>         only for L3 instance that has this control, and programmed only for that L3 instance.
> 
>        L3:0=XXXX,X;L3:1=XXXX
> 
>        And as per proposed format:
>    
>        L3:0=XXXX,PPART=X, L3:1=XXXX

I'm a bit lost ... what proposed format?

>        
> * How can user space know that a system supports "Priority partitioning control"?
>   User space needs to know when/if it can attempt to write a priority to the
>   schemata.
> [>>] At the moment, we label only the resource class, and would like to propose we should
>         label newly added controls (under a resource class) as well so that user can easily identify 
>         which control to program. For instance, the schemata file with this proposed changes
>         will look like this:
>         
>         L3:0=XXXX,PPART=X
> 
>        where PPART=Priority partitioning control, Similarly, if L3 resource class has one more capability, say cache capacity partitioning.
> 
>        L3:0=XXXX,PPART=X,CCAP=X
> 
>       Very first control always be CAT/CPOR (with no labels)
>       

Is your response intended to be read from bottom to top?

> * How can user space know what priority values are valid for a particular system?
> [>>] Supported priority values are read from one of the MPAM Priority Partitioning register, and in the
>         Schemata file, it is set to Maximum value just like Cache portion bitmaps or Memory bandwidth allocation.
>         For instance:
>    
>         L3:0=ffff,f, max priority values is f, and user can program/set from 0-15

Doing so would require user space to (a) be running from the
time resctrl is mounted, and (b) maintain state about all
resctrl resources, properties, and supported values.

I think that this is risky and places a burden on user space that
in some scenarios would be impossible to achieve. Consider the
scenario when user space starts running after resctrl has been
in use for a while or if user space loses its state. The
info directory is where information about enabled resources
are located.

>       
> 
>> Test set-up and results:
>> ------------------------
>>
>> The downstream priority value feeds into DRAM controller, and one of 
>> the important thing that it does with this value is to service the 
>> requests sooner (based on the traffic class), hence reducing latency without affecting performance.
> 
> Could you please elaborate here? I expected reduced latency to have a big impact on performance.
> [>>] To be clear, by performance, it meant Memory bandwidth, and with this  specific configuration/test
>        We see priority partitioning as a utility to guarantee lower latency. We are yet to explore its affect
>        On memory bandwidth side.

Please be careful about claims because the above sounds to me as though
this work claims to not affect memory bandwidth but it is also
states that the impact on memory bandwidth has not yet been explored.

Reinette
Amit Singh Tomar Aug. 23, 2023, 9:33 p.m. UTC | #6
Hi Reinette,

(Kindly follow the responses in a top-to-bottom sequence).

-----Original Message-----
From: Reinette Chatre <reinette.chatre@intel.com> 
Sent: Thursday, August 24, 2023 12:37 AM
To: Amit Singh Tomar <amitsinght@marvell.com>; linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, Tony <tony.luck@intel.com>
Subject: Re: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control

Hi Amit,

On 8/22/2023 5:44 AM, Amit Singh Tomar wrote:
> Hi Reinette,
> 
> Thanks for having a look!
> 
> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Friday, August 18, 2023 12:41 AM
> To: Amit Singh Tomar <amitsinght@marvell.com>; 
> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian 
> <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, 
> Tony <tony.luck@intel.com>
> Subject: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority 
> partitioning control
> 
> External Email
> 
> ----------------------------------------------------------------------
> (+Tony)
> 
> Hi Amit,
> 
> On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
>> Arm Memory System Resource Partitioning and Monitoring (MPAM) 
>> supports different controls that can be applied to different 
>> resources in the system For instance, an optional priority 
>> partitioning control where priority value is generated from one MSC, 
>> propagates over interconnect to other MSC (known as downstream 
>> priority), or can be applied within an MSC for internal operations.
>>
>> Marvell implementation of ARM MPAM supports priority partitioning 
>> control that allows LLC MSC to generate priority values that gets 
>> propagated (along with read/write request from upstream) to DDR Block.
>> Within the DDR block the priority values is mapped to different traffic class under DDR QoS strategy.
>> The link[1] gives some idea about DDR QoS strategy, and terms like 
>> LPR, VPR and HPR.
>>
>> Setup priority partitioning control under Resource control
>> ----------------------------------------------------------
>> At present, resource control (resctrl) provides basic interface to 
>> configure/set-up CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
>> ARM MPAM uses it to support controls like Cache portion partition 
>> (CPOR), and MPAM bandwidth partitioning.
>>
>> As an example, "schemata" file under resource control group contains 
>> information about cache portion bitmaps, and memory bandwidth 
>> allocation, and these are used to configure Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
>>
>> MB:0=0100
>> L3:0=ffff
>>
>> But resctrl doesn't provide a way to set-up other control that ARM 
>> MPAM provides (For instance, Priority partitioning control as 
>> mentioned above). To support this, James has suggested to use already 
>> existing schemata to be compatible with portable software, and this 
>> is the main idea behind this RFC is to have some kind of discussion on how resctrl can be extended to support priority partitioning control.
>>
>> To support Priority partitioning control, "schemata" file is updated 
>> to accommodate priority field (upon priority partitioning capability 
>> detection), separated from CPBM using delimiter ",".
>>
>> L3:0=ffff,f where f indicates downstream priority max value.
>>
>> These dspri value gets programmed per partition, that can be used to 
>> override QoS value coming from upstream (CPU).
>>
>> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2, 
>> and ACPI table is based on DEN0065A_MPAM_ACPI_2.0.
>>
> 
> There are some aspects of this that I think we should be cautious 
> about. First, there may inevitably be more properties in the future 
> that need to be associated with a resource allocation, these may 
> indeed be different between architectures and individual platforms.
> Second, user space need a way to know which properties are supported 
> and what valid parameters may be.
> 
> On a high level I thus understand the goal be to add support for 
> assigning a property to a resource allocation with "Priority 
> partitioning control" being the first property.

> To that end, I have a few questions:
> * How can this interface be expanded to support more properties with the
>   expectation that a system/architecture may not support all resctrl supported
>   properties?
> [>>] All these new controls ("Priority partitioning is one of them) detected as resource capabilities (via Features Identification Register), and these control will not be probed, if system/architecture
>         doesn't support it. From resource control side, this means that users will never get to know about the controls from schemata file. For instance, the platform that supports Priority partitioning control
>         schemata file looks like:
> 
>        # cat schemata 
>            L3:1=ffff
> 
>         As oppose to when system has Priority partitioning control
>         # cat schemata 
>            L3:1=ffff,f
>

Right, but my question is "How can this interface be expanded ...".
Consider a future L3 resource that has a new and different property
("new_property") that is independent from "Priority partitioning". 
If "L3:1=ffff,f" means "Priority partitioning" == 0xf, how can a value be assigned to "new_property" if the system's L3 supports it but not "Priority partitioning"?
If I understand correctly the proposed interface is a positional interface and "Priority partitioning" is always in second field ...

[>>] Yes, "Priority partitioning" will always be the second field.

but a system may or may not support this property so does it require an empty second field to be able to use other properties?

[>>] Yes, in the absence of this control ("Priority partitioning"), second field will be taken by other control (if supported).

So, for example, if L3 resource is equipped with two controls, .i.e. CPOR and PPART, schemata will look like:

         L3:0=XXXX,PPART=X

and, if same resource is equipped with another set of controls, .i.e. CPOR and CCAP (cache capacity partitioning), schemata will look like:

         L3:0=XXXX,CCAP=X

and, in case resource is equipped with all three controls, schemata will look like:

        L3:0=XXXX,PPART=X,CCAP=X

Each of these combinations, features its own format specifier.
    
>         
> * Is it possible for support for properties to vary between, for example, different
>   MSCs in the system? From resctrl side it may mean that there would be a resource,
>   for example "L3", with multiple instances, for example, cache with id #0, cache
>   with id#1, etc. but the supported properties or valid values of properties
>   may vary between the instances?
> [>>] This is really implementation dependent but we would expect, if multiple L3 instances
>         across multiple dies implements this control, it should be uniform across, but let's take a case
>         where L3 MSC instance on one socket has this control, and other L3 MSC instance on another 
>         socket doesn't have this control. From resctrl perspective, one would see this control
>         only for L3 instance that has this control, and programmed only for that L3 instance.
> 
>        L3:0=XXXX,X;L3:1=XXXX
> 
>        And as per proposed format:
>    
>        L3:0=XXXX,PPART=X, L3:1=XXXX

I'm a bit lost ... what proposed format?
[>>] Sorry about that, I should have indicated the proposed format is in the point below.

>        
> * How can user space know that a system supports "Priority partitioning control"?
>   User space needs to know when/if it can attempt to write a priority to the
>   schemata.
> [>>] At the moment, we label only the resource class, and would like to propose we should
>         label newly added controls (under a resource class) as well so that user can easily identify 
>         which control to program. For instance, the schemata file with this proposed changes
>         will look like this:
>         
>         L3:0=XXXX,PPART=X
> 
>        where PPART=Priority partitioning control, Similarly, if L3 resource class has one more capability, say cache capacity partitioning.
> 
>        L3:0=XXXX,PPART=X,CCAP=X
> 
>       Very first control always be CAT/CPOR (with no labels)
>       

Is your response intended to be read from bottom to top?

> * How can user space know what priority values are valid for a particular system?
> [>>] Supported priority values are read from one of the MPAM Priority Partitioning register, and in the
>         Schemata file, it is set to Maximum value just like Cache portion bitmaps or Memory bandwidth allocation.
>         For instance:
>    
>         L3:0=ffff,f, max priority values is f, and user can 
> program/set from 0-15

Doing so would require user space to (a) be running from the time resctrl is mounted, and (b) maintain state about all resctrl resources, properties, and supported values.

I think that this is risky and places a burden on user space that in some scenarios would be impossible to achieve. Consider the scenario when user space starts running after resctrl has been in use for a while or if user space loses its state. The info directory is where information about enabled resources are located.

[>>] Thanks for point it out, will export this information to info directory.

>       
> 
>> Test set-up and results:
>> ------------------------
>>
>> The downstream priority value feeds into DRAM controller, and one of 
>> the important thing that it does with this value is to service the 
>> requests sooner (based on the traffic class), hence reducing latency without affecting performance.
> 
> Could you please elaborate here? I expected reduced latency to have a big impact on performance.
> [>>] To be clear, by performance, it meant Memory bandwidth, and with this  specific configuration/test
>        We see priority partitioning as a utility to guarantee lower latency. We are yet to explore its affect
>        On memory bandwidth side.

Please be careful about claims because the above sounds to me as though this work claims to not affect memory bandwidth but it is also states that the impact on memory bandwidth has not yet been explored.
[>>] Sure, will be more careful with my wording but the previous statement "hence reducing latency without affecting performance" is based on
test result we presented. For instance, if we look at Bandwidth numbers across the priority values, it's almost the same ~345 GB/s.

Thanks
-Amit
Reinette Chatre Aug. 23, 2023, 10:20 p.m. UTC | #7
Hi Amit,

On 8/23/2023 2:33 PM, Amit Singh Tomar wrote:
> Hi Reinette,
> 
> (Kindly follow the responses in a top-to-bottom sequence).
> 
> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com> 
> Sent: Thursday, August 24, 2023 12:37 AM
> To: Amit Singh Tomar <amitsinght@marvell.com>; linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, Tony <tony.luck@intel.com>
> Subject: Re: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control
> 
> Hi Amit,
> 
> On 8/22/2023 5:44 AM, Amit Singh Tomar wrote:
>> Hi Reinette,
>>
>> Thanks for having a look!
>>
>> -----Original Message-----
>> From: Reinette Chatre <reinette.chatre@intel.com>
>> Sent: Friday, August 18, 2023 12:41 AM
>> To: Amit Singh Tomar <amitsinght@marvell.com>; 
>> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian 
>> <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, 
>> Tony <tony.luck@intel.com>
>> Subject: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority 
>> partitioning control
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> (+Tony)
>>
>> Hi Amit,
>>
>> On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
>>> Arm Memory System Resource Partitioning and Monitoring (MPAM) 
>>> supports different controls that can be applied to different 
>>> resources in the system For instance, an optional priority 
>>> partitioning control where priority value is generated from one MSC, 
>>> propagates over interconnect to other MSC (known as downstream 
>>> priority), or can be applied within an MSC for internal operations.
>>>
>>> Marvell implementation of ARM MPAM supports priority partitioning 
>>> control that allows LLC MSC to generate priority values that gets 
>>> propagated (along with read/write request from upstream) to DDR Block.
>>> Within the DDR block the priority values is mapped to different traffic class under DDR QoS strategy.
>>> The link[1] gives some idea about DDR QoS strategy, and terms like 
>>> LPR, VPR and HPR.
>>>
>>> Setup priority partitioning control under Resource control
>>> ----------------------------------------------------------
>>> At present, resource control (resctrl) provides basic interface to 
>>> configure/set-up CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
>>> ARM MPAM uses it to support controls like Cache portion partition 
>>> (CPOR), and MPAM bandwidth partitioning.
>>>
>>> As an example, "schemata" file under resource control group contains 
>>> information about cache portion bitmaps, and memory bandwidth 
>>> allocation, and these are used to configure Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
>>>
>>> MB:0=0100
>>> L3:0=ffff
>>>
>>> But resctrl doesn't provide a way to set-up other control that ARM 
>>> MPAM provides (For instance, Priority partitioning control as 
>>> mentioned above). To support this, James has suggested to use already 
>>> existing schemata to be compatible with portable software, and this 
>>> is the main idea behind this RFC is to have some kind of discussion on how resctrl can be extended to support priority partitioning control.
>>>
>>> To support Priority partitioning control, "schemata" file is updated 
>>> to accommodate priority field (upon priority partitioning capability 
>>> detection), separated from CPBM using delimiter ",".
>>>
>>> L3:0=ffff,f where f indicates downstream priority max value.
>>>
>>> These dspri value gets programmed per partition, that can be used to 
>>> override QoS value coming from upstream (CPU).
>>>
>>> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2, 
>>> and ACPI table is based on DEN0065A_MPAM_ACPI_2.0.
>>>
>>
>> There are some aspects of this that I think we should be cautious 
>> about. First, there may inevitably be more properties in the future 
>> that need to be associated with a resource allocation, these may 
>> indeed be different between architectures and individual platforms.
>> Second, user space need a way to know which properties are supported 
>> and what valid parameters may be.
>>
>> On a high level I thus understand the goal be to add support for 
>> assigning a property to a resource allocation with "Priority 
>> partitioning control" being the first property.
> 
>> To that end, I have a few questions:
>> * How can this interface be expanded to support more properties with the
>>   expectation that a system/architecture may not support all resctrl supported
>>   properties?
>> [>>] All these new controls ("Priority partitioning is one of them) detected as resource capabilities (via Features Identification Register), and these control will not be probed, if system/architecture
>>         doesn't support it. From resource control side, this means that users will never get to know about the controls from schemata file. For instance, the platform that supports Priority partitioning control
>>         schemata file looks like:
>>
>>        # cat schemata 
>>            L3:1=ffff
>>
>>         As oppose to when system has Priority partitioning control
>>         # cat schemata 
>>            L3:1=ffff,f
>>
> 
> Right, but my question is "How can this interface be expanded ...".
> Consider a future L3 resource that has a new and different property
> ("new_property") that is independent from "Priority partitioning". 
> If "L3:1=ffff,f" means "Priority partitioning" == 0xf, how can a value be assigned to "new_property" if the system's L3 supports it but not "Priority partitioning"?
> If I understand correctly the proposed interface is a positional interface and "Priority partitioning" is always in second field ...
> 
> [>>] Yes, "Priority partitioning" will always be the second field.
> 
> but a system may or may not support this property so does it require an empty second field to be able to use other properties?
> 
> [>>] Yes, in the absence of this control ("Priority partitioning"), second field will be taken by other control (if supported).
> 
> So, for example, if L3 resource is equipped with two controls, .i.e. CPOR and PPART, schemata will look like:
> 
>          L3:0=XXXX,PPART=X
> 
> and, if same resource is equipped with another set of controls, .i.e. CPOR and CCAP (cache capacity partitioning), schemata will look like:
> 
>          L3:0=XXXX,CCAP=X
> 
> and, in case resource is equipped with all three controls, schemata will look like:
> 
>         L3:0=XXXX,PPART=X,CCAP=X
> 
> Each of these combinations, features its own format specifier.
>     

I see. I do have a similar concern as Peter regarding the impact of
this change on parsing of the schemata file. I peeked at intel-cmt-cat's
implementation [1] and if I understand it correctly these changes will
break it. This is just one example but I do think this will have
significant impact on user space that should be avoided.

Apart from this this discussion focused on the display of properties when
user views the schemata file. We also need to consider
how the user will provide new data by writing to the schemata file.
For example, I do not think it is convenient for the user to
have to provide the allocation bitmask every time the
"Priority partitioning" value needs to be changed for a resource
instance. This may also be solved when considering Peter's idea but
since this work depends on other work that is not upstream it
is difficult to envision the impact of any suggestions.

Reinette

[1] https://github.com/intel/intel-cmt-cat/blob/master/lib/resctrl_schemata.c#L495
Tony Luck Aug. 23, 2023, 10:36 p.m. UTC | #8
> I see. I do have a similar concern as Peter regarding the impact of
> this change on parsing of the schemata file. I peeked at intel-cmt-cat's
> implementation [1] and if I understand it correctly these changes will
> break it. This is just one example but I do think this will have
> significant impact on user space that should be avoided.
>
> Apart from this this discussion focused on the display of properties when
> user views the schemata file. We also need to consider
> how the user will provide new data by writing to the schemata file.
> For example, I do not think it is convenient for the user to
> have to provide the allocation bitmask every time the
> "Priority partitioning" value needs to be changed for a resource
> instance. This may also be solved when considering Peter's idea but
> since this work depends on other work that is not upstream it
> is difficult to envision the impact of any suggestions.

Would if be better to add additional files? E.g. keep the syntax of
the schemata file the same. Just specifying the cache allocation
bitmask for each cache instance.

Then have a separate file (or files) for these additional attributes
like PPART and CCAP.

How are these likely to be used in practice? Would a user need to
update all of these at once (in which case separate files would be
inconvenient). Or is is likely that updates to mask, PPART, CCAP
are orthogonal, and so updates are not usually done together?

-Tony
Amit Singh Tomar Aug. 24, 2023, 8:52 a.m. UTC | #9
Hi Reinette,

Thanks for your prompt response.

-----Original Message-----
From: Reinette Chatre <reinette.chatre@intel.com> 
Sent: Thursday, August 24, 2023 3:50 AM
To: Amit Singh Tomar <amitsinght@marvell.com>; linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, Tony <tony.luck@intel.com>
Subject: Re: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control

Hi Amit,

On 8/23/2023 2:33 PM, Amit Singh Tomar wrote:
> Hi Reinette,
> 
> (Kindly follow the responses in a top-to-bottom sequence).
> 
> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, August 24, 2023 12:37 AM
> To: Amit Singh Tomar <amitsinght@marvell.com>; 
> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian 
> <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, 
> Tony <tony.luck@intel.com>
> Subject: Re: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority 
> partitioning control
> 
> Hi Amit,
> 
> On 8/22/2023 5:44 AM, Amit Singh Tomar wrote:
>> Hi Reinette,
>>
>> Thanks for having a look!
>>
>> -----Original Message-----
>> From: Reinette Chatre <reinette.chatre@intel.com>
>> Sent: Friday, August 18, 2023 12:41 AM
>> To: Amit Singh Tomar <amitsinght@marvell.com>; 
>> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian 
>> <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; 
>> Luck, Tony <tony.luck@intel.com>
>> Subject: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority 
>> partitioning control
>>
>> External Email
>>
>> ---------------------------------------------------------------------
>> -
>> (+Tony)
>>
>> Hi Amit,
>>
>> On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
>>> Arm Memory System Resource Partitioning and Monitoring (MPAM) 
>>> supports different controls that can be applied to different 
>>> resources in the system For instance, an optional priority 
>>> partitioning control where priority value is generated from one MSC, 
>>> propagates over interconnect to other MSC (known as downstream 
>>> priority), or can be applied within an MSC for internal operations.
>>>
>>> Marvell implementation of ARM MPAM supports priority partitioning 
>>> control that allows LLC MSC to generate priority values that gets 
>>> propagated (along with read/write request from upstream) to DDR Block.
>>> Within the DDR block the priority values is mapped to different traffic class under DDR QoS strategy.
>>> The link[1] gives some idea about DDR QoS strategy, and terms like 
>>> LPR, VPR and HPR.
>>>
>>> Setup priority partitioning control under Resource control
>>> ----------------------------------------------------------
>>> At present, resource control (resctrl) provides basic interface to 
>>> configure/set-up CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
>>> ARM MPAM uses it to support controls like Cache portion partition 
>>> (CPOR), and MPAM bandwidth partitioning.
>>>
>>> As an example, "schemata" file under resource control group contains 
>>> information about cache portion bitmaps, and memory bandwidth 
>>> allocation, and these are used to configure Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
>>>
>>> MB:0=0100
>>> L3:0=ffff
>>>
>>> But resctrl doesn't provide a way to set-up other control that ARM 
>>> MPAM provides (For instance, Priority partitioning control as 
>>> mentioned above). To support this, James has suggested to use 
>>> already existing schemata to be compatible with portable software, 
>>> and this is the main idea behind this RFC is to have some kind of discussion on how resctrl can be extended to support priority partitioning control.
>>>
>>> To support Priority partitioning control, "schemata" file is updated 
>>> to accommodate priority field (upon priority partitioning capability 
>>> detection), separated from CPBM using delimiter ",".
>>>
>>> L3:0=ffff,f where f indicates downstream priority max value.
>>>
>>> These dspri value gets programmed per partition, that can be used to 
>>> override QoS value coming from upstream (CPU).
>>>
>>> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2, 
>>> and ACPI table is based on DEN0065A_MPAM_ACPI_2.0.
>>>
>>
>> There are some aspects of this that I think we should be cautious 
>> about. First, there may inevitably be more properties in the future 
>> that need to be associated with a resource allocation, these may 
>> indeed be different between architectures and individual platforms.
>> Second, user space need a way to know which properties are supported 
>> and what valid parameters may be.
>>
>> On a high level I thus understand the goal be to add support for 
>> assigning a property to a resource allocation with "Priority 
>> partitioning control" being the first property.
> 
>> To that end, I have a few questions:
>> * How can this interface be expanded to support more properties with the
>>   expectation that a system/architecture may not support all resctrl supported
>>   properties?
>> [>>] All these new controls ("Priority partitioning is one of them) detected as resource capabilities (via Features Identification Register), and these control will not be probed, if system/architecture
>>         doesn't support it. From resource control side, this means that users will never get to know about the controls from schemata file. For instance, the platform that supports Priority partitioning control
>>         schemata file looks like:
>>
>>        # cat schemata 
>>            L3:1=ffff
>>
>>         As oppose to when system has Priority partitioning control
>>         # cat schemata 
>>            L3:1=ffff,f
>>
> 
> Right, but my question is "How can this interface be expanded ...".
> Consider a future L3 resource that has a new and different property
> ("new_property") that is independent from "Priority partitioning". 
> If "L3:1=ffff,f" means "Priority partitioning" == 0xf, how can a value be assigned to "new_property" if the system's L3 supports it but not "Priority partitioning"?
> If I understand correctly the proposed interface is a positional interface and "Priority partitioning" is always in second field ...
> 
> [>>] Yes, "Priority partitioning" will always be the second field.
> 
> but a system may or may not support this property so does it require an empty second field to be able to use other properties?
> 
> [>>] Yes, in the absence of this control ("Priority partitioning"), second field will be taken by other control (if supported).
> 
> So, for example, if L3 resource is equipped with two controls, .i.e. CPOR and PPART, schemata will look like:
> 
>          L3:0=XXXX,PPART=X
> 
> and, if same resource is equipped with another set of controls, .i.e. CPOR and CCAP (cache capacity partitioning), schemata will look like:
> 
>          L3:0=XXXX,CCAP=X
> 
> and, in case resource is equipped with all three controls, schemata will look like:
> 
>         L3:0=XXXX,PPART=X,CCAP=X
> 
> Each of these combinations, features its own format specifier.
>     

I see. I do have a similar concern as Peter regarding the impact of this change on parsing of the schemata file. I peeked at intel-cmt-cat's implementation [1] and if I understand it correctly these changes will break it. This is just one example but I do think this will have significant impact on user space that should be avoided.[>>] 

[>>] To be honest, I don't see how it breaks things on x86 side. None of these new controls (PPART, or CCAP) exist for intel platform, and in absence of these control, schemata file remains the same, .i.e.
        L3:0=ffff

       Or you're talking about the situation when intel may have similar control, and this proposed approach would break intel-cmt-cat then?

Apart from this this discussion focused on the display of properties when user views the schemata file. We also need to consider how the user will provide new data by writing to the schemata file.
For example, I do not think it is convenient for the user to have to provide the allocation bitmask every time the "Priority partitioning" value needs to be changed for a resource instance. 

[>>] This is something, I was pondering about, not to provide allocation bitmask while changing "Priority partitioning" values or vice-versa but ARM MPAM device driver
run through all the resource instances (learned from ACPI table) and program the ris_idx (along with partid) into MPAMCFG_PART_SEL_NS[RIS].
After that, programs the portion bit map (related to CPOR), or Priority value (depends on the ris_idx) into MPAMCFG_CPBM_NS or MPAMCFG_PRI_NS[dspri].

As example, for resource index 0 (MPAMCFG_PART_SEL_NS[0]), it programs Priority value, and for resource index 1 ( MPAMCFG_PART_SEL_NS[1]), it programs portion bitmap value. In a way, Driver[1]
Expects both these values to be supplied. May be James can correct me here.

This may also be solved when considering Peter's idea but since this work depends on other work that is not upstream it is difficult to envision the impact of any suggestions.

[>>] Initially, we have thought about these three approaches:

1) Populate the resource control filesystem[2] with a new file that corresponds to new control. It requires Priority value to be encoded around portion bitmaps, and James has suggested we should go via
   "schemata" file approach.

   I think, this is something Tony has pointed out in other thread.

2) Second approach that we discussed internally is to have schemata for CPOR, and PPART separated by new line as mentioned/suggested by Peter, But it may require to tweak
   the ARM MPAM device driver a bit. It was kind of toss-up between 2nd and 3nd approach :), and we went with the 3rd one.

   L3:0=XXXX
   L3:0=PPART=X

   Will look into it again.

3) This is the approach we presented in this RFC.

Thanks,
-Amit

 [1]: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/tree/drivers/platform/mpam/mpam_devices.c?h=mpam/snapshot/v6.5-rc1#n1175
[2]: https://github.com/Amit-Radur/linux/commit/5b603e282c6e15a79ae03b2ed4882b672724c018
Tony Luck Aug. 24, 2023, 3:30 p.m. UTC | #10
> 2) Second approach that we discussed internally is to have schemata for CPOR, and PPART separated by new line as mentioned/suggested by Peter, But it may require to tweak
>    the ARM MPAM device driver a bit. It was kind of toss-up between 2nd and 3nd approach :), and we went with the 3rd one.
>
>    L3:0=XXXX
>    L3:0=PPART=X
>
>    Will look into it again.

That looks hard to parse. How about:

L3:0=XXX;1=YYY
L3PPART:0=AAA;1=BBB
L3CPOR:0=MMM;1=NNN

-Tony
Reinette Chatre Aug. 24, 2023, 6 p.m. UTC | #11
Hi Amit,

On 8/24/2023 1:52 AM, Amit Singh Tomar wrote:
> Hi Reinette,
> 
> Thanks for your prompt response.
> 
> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com> 
> Sent: Thursday, August 24, 2023 3:50 AM
> To: Amit Singh Tomar <amitsinght@marvell.com>; linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, Tony <tony.luck@intel.com>
> Subject: Re: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority partitioning control
> 
> Hi Amit,
> 
> On 8/23/2023 2:33 PM, Amit Singh Tomar wrote:
>> Hi Reinette,
>>
>> (Kindly follow the responses in a top-to-bottom sequence).
>>
>> -----Original Message-----
>> From: Reinette Chatre <reinette.chatre@intel.com>
>> Sent: Thursday, August 24, 2023 12:37 AM
>> To: Amit Singh Tomar <amitsinght@marvell.com>; 
>> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian 
>> <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; Luck, 
>> Tony <tony.luck@intel.com>
>> Subject: Re: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority 
>> partitioning control
>>
>> Hi Amit,
>>
>> On 8/22/2023 5:44 AM, Amit Singh Tomar wrote:
>>> Hi Reinette,
>>>
>>> Thanks for having a look!
>>>
>>> -----Original Message-----
>>> From: Reinette Chatre <reinette.chatre@intel.com>
>>> Sent: Friday, August 18, 2023 12:41 AM
>>> To: Amit Singh Tomar <amitsinght@marvell.com>; 
>>> linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org
>>> Cc: fenghua.yu@intel.com; james.morse@arm.com; George Cherian 
>>> <gcherian@marvell.com>; robh@kernel.org; peternewman@google.com; 
>>> Luck, Tony <tony.luck@intel.com>
>>> Subject: [EXT] Re: [RFC 00/12] ARM: MPAM: add support for priority 
>>> partitioning control
>>>
>>> External Email
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> (+Tony)
>>>
>>> Hi Amit,
>>>
>>> On 8/15/2023 8:27 AM, Amit Singh Tomar wrote:
>>>> Arm Memory System Resource Partitioning and Monitoring (MPAM) 
>>>> supports different controls that can be applied to different 
>>>> resources in the system For instance, an optional priority 
>>>> partitioning control where priority value is generated from one MSC, 
>>>> propagates over interconnect to other MSC (known as downstream 
>>>> priority), or can be applied within an MSC for internal operations.
>>>>
>>>> Marvell implementation of ARM MPAM supports priority partitioning 
>>>> control that allows LLC MSC to generate priority values that gets 
>>>> propagated (along with read/write request from upstream) to DDR Block.
>>>> Within the DDR block the priority values is mapped to different traffic class under DDR QoS strategy.
>>>> The link[1] gives some idea about DDR QoS strategy, and terms like 
>>>> LPR, VPR and HPR.
>>>>
>>>> Setup priority partitioning control under Resource control
>>>> ----------------------------------------------------------
>>>> At present, resource control (resctrl) provides basic interface to 
>>>> configure/set-up CAT (Cache Allocation Technology) and MBA (Memory Bandwidth Allocation) capabilities.
>>>> ARM MPAM uses it to support controls like Cache portion partition 
>>>> (CPOR), and MPAM bandwidth partitioning.
>>>>
>>>> As an example, "schemata" file under resource control group contains 
>>>> information about cache portion bitmaps, and memory bandwidth 
>>>> allocation, and these are used to configure Cache portion partition (CPOR), and MPAM bandwidth partitioning controls.
>>>>
>>>> MB:0=0100
>>>> L3:0=ffff
>>>>
>>>> But resctrl doesn't provide a way to set-up other control that ARM 
>>>> MPAM provides (For instance, Priority partitioning control as 
>>>> mentioned above). To support this, James has suggested to use 
>>>> already existing schemata to be compatible with portable software, 
>>>> and this is the main idea behind this RFC is to have some kind of discussion on how resctrl can be extended to support priority partitioning control.
>>>>
>>>> To support Priority partitioning control, "schemata" file is updated 
>>>> to accommodate priority field (upon priority partitioning capability 
>>>> detection), separated from CPBM using delimiter ",".
>>>>
>>>> L3:0=ffff,f where f indicates downstream priority max value.
>>>>
>>>> These dspri value gets programmed per partition, that can be used to 
>>>> override QoS value coming from upstream (CPU).
>>>>
>>>> RFC patch-set[2] is based on James Morse's MPAM snapshot[3] for 6.2, 
>>>> and ACPI table is based on DEN0065A_MPAM_ACPI_2.0.
>>>>
>>>
>>> There are some aspects of this that I think we should be cautious 
>>> about. First, there may inevitably be more properties in the future 
>>> that need to be associated with a resource allocation, these may 
>>> indeed be different between architectures and individual platforms.
>>> Second, user space need a way to know which properties are supported 
>>> and what valid parameters may be.
>>>
>>> On a high level I thus understand the goal be to add support for 
>>> assigning a property to a resource allocation with "Priority 
>>> partitioning control" being the first property.
>>
>>> To that end, I have a few questions:
>>> * How can this interface be expanded to support more properties with the
>>>   expectation that a system/architecture may not support all resctrl supported
>>>   properties?
>>> [>>] All these new controls ("Priority partitioning is one of them) detected as resource capabilities (via Features Identification Register), and these control will not be probed, if system/architecture
>>>         doesn't support it. From resource control side, this means that users will never get to know about the controls from schemata file. For instance, the platform that supports Priority partitioning control
>>>         schemata file looks like:
>>>
>>>        # cat schemata 
>>>            L3:1=ffff
>>>
>>>         As oppose to when system has Priority partitioning control
>>>         # cat schemata 
>>>            L3:1=ffff,f
>>>
>>
>> Right, but my question is "How can this interface be expanded ...".
>> Consider a future L3 resource that has a new and different property
>> ("new_property") that is independent from "Priority partitioning". 
>> If "L3:1=ffff,f" means "Priority partitioning" == 0xf, how can a value be assigned to "new_property" if the system's L3 supports it but not "Priority partitioning"?
>> If I understand correctly the proposed interface is a positional interface and "Priority partitioning" is always in second field ...
>>
>> [>>] Yes, "Priority partitioning" will always be the second field.
>>
>> but a system may or may not support this property so does it require an empty second field to be able to use other properties?
>>
>> [>>] Yes, in the absence of this control ("Priority partitioning"), second field will be taken by other control (if supported).
>>
>> So, for example, if L3 resource is equipped with two controls, .i.e. CPOR and PPART, schemata will look like:
>>
>>          L3:0=XXXX,PPART=X
>>
>> and, if same resource is equipped with another set of controls, .i.e. CPOR and CCAP (cache capacity partitioning), schemata will look like:
>>
>>          L3:0=XXXX,CCAP=X
>>
>> and, in case resource is equipped with all three controls, schemata will look like:
>>
>>         L3:0=XXXX,PPART=X,CCAP=X
>>
>> Each of these combinations, features its own format specifier.
>>     
> 
> I see. I do have a similar concern as Peter regarding the impact of this change on parsing of the schemata file. I peeked at intel-cmt-cat's implementation [1] and if I understand it correctly these changes will break it. This is just one example but I do think this will have significant impact on user space that should be avoided.[>>] 
> 
> [>>] To be honest, I don't see how it breaks things on x86 side. None of these new controls (PPART, or CCAP) exist for intel platform, and in absence of these control, schemata file remains the same, .i.e.
>         L3:0=ffff
> 
>        Or you're talking about the situation when intel may have similar control, and this proposed approach would break intel-cmt-cat then?

There are indeed two parts to this. First, I still consider
this as breaking user space because user space interacts with
"resctrl" that should be a generic interface. Second, yes, any
"resctrl" interface is available to every vendor. It is not expected
that all systems support all features but resctrl is the interface
with which user space can query what features are supported in
order to interact with the features.

> Apart from this this discussion focused on the display of properties when user views the schemata file. We also need to consider how the user will provide new data by writing to the schemata file.
> For example, I do not think it is convenient for the user to have to provide the allocation bitmask every time the "Priority partitioning" value needs to be changed for a resource instance. 
> 
> [>>] This is something, I was pondering about, not to provide allocation bitmask while changing "Priority partitioning" values or vice-versa but ARM MPAM device driver
> run through all the resource instances (learned from ACPI table) and program the ris_idx (along with partid) into MPAMCFG_PART_SEL_NS[RIS].
> After that, programs the portion bit map (related to CPOR), or Priority value (depends on the ris_idx) into MPAMCFG_CPBM_NS or MPAMCFG_PRI_NS[dspri].
> 
> As example, for resource index 0 (MPAMCFG_PART_SEL_NS[0]), it programs Priority value, and for resource index 1 ( MPAMCFG_PART_SEL_NS[1]), it programs portion bitmap value. In a way, Driver[1]
> Expects both these values to be supplied. May be James can correct me here.

I see obtaining the data from user space as separate from
writing the data to the hardware. resctrl maintains the
hardware configuration internally so it is possible to
have user space modify a portion of the configuration
while still being able to write the entire configuration
to hardware if that is required.


> This may also be solved when considering Peter's idea but since this work depends on other work that is not upstream it is difficult to envision the impact of any suggestions.
> 
> [>>] Initially, we have thought about these three approaches:
> 
> 1) Populate the resource control filesystem[2] with a new file that corresponds to new control. It requires Priority value to be encoded around portion bitmaps, and James has suggested we should go via
>    "schemata" file approach.
> 
>    I think, this is something Tony has pointed out in other thread.

Synchronizing writes to hardware with updates to separate
files may be a challenge.

> 
> 2) Second approach that we discussed internally is to have schemata for CPOR, and PPART separated by new line as mentioned/suggested by Peter, But it may require to tweak
>    the ARM MPAM device driver a bit. It was kind of toss-up between 2nd and 3nd approach :), and we went with the 3rd one.
> 
>    L3:0=XXXX
>    L3:0=PPART=X
> 
>    Will look into it again.

Tony has suggestions here. I think it would be a good exercise to
write a user space client to explore how the interface
can be made most convenient.

Reinette
Jonathan Cameron Sept. 1, 2023, 2:42 p.m. UTC | #12
On Tue, 15 Aug 2023 20:57:00 +0530
Amit Singh Tomar <amitsinght@marvell.com> wrote:

FWIW I've pushed out a QEMU tree with the MPAM patches posted previously
and an additional one enabling DSPRI on all the caches +
introspection and some additional sanity checks to pick up on the width
of DSPRI bug Amit fixed.

I used that to test this series and it seems fine subject to the TODO
on the final patch.

Note that's a simple model and doesn't actually do anything but is easy
to modify to poke corner cases / features you don't hardware for etc.

gitlab.com/jic23/qemu 

More info in the qemu patch series RFC cover letter:
https://lore.kernel.org/qemu-devel/20230808115713.2613-1-Jonathan.Cameron@huawei.com/#t
(there is an outstanding build issue for arm32, so don't build that :)

Jonathan
Jonathan Cameron Sept. 1, 2023, 3:04 p.m. UTC | #13
On Tue, 15 Aug 2023 20:57:00 +0530
Amit Singh Tomar <amitsinght@marvell.com> wrote:

> Arm Memory System Resource Partitioning and Monitoring (MPAM) supports
> different controls that can be applied to different resources in the system
> For instance, an optional priority partitioning control where priority
> value is generated from one MSC, propagates over interconnect to other MSC
> (known as downstream priority), or can be applied within an MSC for internal
> operations.

Hi Amit,

I'll most leave side commenting on the actual interface as lots of discussion has
occurred on that already so I'll wait for the next version and see where things
ended up :)

As a side note, openEuler has been carrying MPAM patches out of tree for a
long time now and have supported various features that align with available hardware.

The interface is partly described in. 
https://github.com/openeuler-mirror/kernel/commit/8139268b70398c37843a38bf8c7b243ad1f20c97

e.g.
   > mount -t resctrl resctrl /sys/fs/resctrl -o mbMax,mbMin,caPrio
   > cd /sys/fs/resctrl && cat schemata
     L3:0=0x7fff;1=0x7fff;2=0x7fff;3=0x7fff #default select cpbm as basic ctrl feature
     L3PRI:0=3;1=3;2=3;3=3
     MBMAX:0=100;1=100;2=100;3=100
     MBMIN:0=0;1=0;2=0;3=0

I'm not sure if this is the latest or not.
> 
> Marvell implementation of ARM MPAM supports priority partitioning control
> that allows LLC MSC to generate priority values that gets propagated (along with
> read/write request from upstream) to DDR Block.

This raises an interesting question of whether we should present these as controls
on the cache, or on the Memory controllers.  This is unlike INTPRI controls which
if present on the caches would definitely make sense presented there in resctrl.

If it were the case that downstream priority controls always just applied to one
block then listing them there (as DDR resource controls) might make sense -
however the section in the spec on "Through priorities" blocks that option as
these apply to everything downstream of which ever blocks set the priorities.

So whilst it's confusing I think you are right in presenting this as part of
the cache resource controls.  For the OpenEuler kernel that problem hasn't
arisen as focus is internal priority in the caches rather than downstream.


> Within the DDR block the
> priority values is mapped to different traffic class under DDR QoS strategy.
> The link[1] gives some idea about DDR QoS strategy, and terms like LPR, VPR
> and HPR.
> 

Jonathan
Peter Newman Jan. 11, 2024, 8:56 p.m. UTC | #14
Hi Amit,

On Thu, Aug 24, 2023 at 1:52 AM Amit Singh Tomar <amitsinght@marvell.com> wrote:

> 2) Second approach that we discussed internally is to have schemata for CPOR, and PPART separated by new line as mentioned/suggested by Peter, But it may require to tweak
>    the ARM MPAM device driver a bit. It was kind of toss-up between 2nd and 3nd approach :), and we went with the 3rd one.
>
>    L3:0=XXXX
>    L3:0=PPART=X
>
>    Will look into it again.

I've been looking into the structure of the MPAM driver to understand
the difficulties here.

It seems the challenge with DSPRI is trying to stuff two different
control schema (partitioning, prioritization) into the L3
rdt_resource. The rdt_resource is still a mix of a hardware component
and user interface (schema line), which leads to  the
__resource_props_mismatch() function[1], which makes an arbitrary
choice (driven by resource index order) of which feature should be the
single control presented for each of the rdt_resources, the properties
of which the fields of its rdt_resource entry should describe.

It only seemed to work out for CDP emulation because the properties of
the two schema (L3CODE, L3DATA) for the L3 resource have the same CBM
properties.

My opinion is that the rdt_resource needs to be removed from
fs/resctrl and replaced with a structure to represent a control schema
and another to represent a monitor so that we don't find ourselves
unable to enumerate controls or monitors because a control or monitor
from the same hardware component has already been enumerated.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/tree/drivers/platform/mpam/mpam_devices.c?h=mpam/snapshot/v6.7-rc2#n1810

-Peter
Tony Luck Jan. 11, 2024, 9:40 p.m. UTC | #15
On Thu, Jan 11, 2024 at 12:56:34PM -0800, Peter Newman wrote:
> Hi Amit,
> 
> On Thu, Aug 24, 2023 at 1:52 AM Amit Singh Tomar <amitsinght@marvell.com> wrote:
> 
> > 2) Second approach that we discussed internally is to have schemata for CPOR, and PPART separated by new line as mentioned/suggested by Peter, But it may require to tweak
> >    the ARM MPAM device driver a bit. It was kind of toss-up between 2nd and 3nd approach :), and we went with the 3rd one.
> >
> >    L3:0=XXXX
> >    L3:0=PPART=X

I'm not sure having multiple lines for the same resource makes anything
clearer.  I preferred one of the earlier proposals like this one:

	L3:0=XXXX,PPART=X,CCAP=X;1=YYYY,CCAP=Y

This makes the schemata file self enumerate which optional capabilities
are present for each L3 instance (in the above example the second
instance doesn't support PPART, but does support CCAP).

Writes to the schemata file already accept partial information, so
the resctrl schemata_write() function should be coded to allow any of:

Just update CCAP for L3 instance 1":
	# echo "L3:1=CCAP=Z" > schemata

Update mask and CCAP for instance 0:
	# echo "L3:0=ABCD,CCAP=Q" > schemata

Update PPART on all instances:
	# echo "L3:0=PPART=M;1=PPART=N" > schemata

Legacy app that only comprehends partioning updates instance 1:
	# echo "L3:1=FFFF" > schemata

-Tony
Reinette Chatre Jan. 11, 2024, 10:01 p.m. UTC | #16
On 1/11/2024 1:40 PM, Tony Luck wrote:
> On Thu, Jan 11, 2024 at 12:56:34PM -0800, Peter Newman wrote:
>> Hi Amit,
>>
>> On Thu, Aug 24, 2023 at 1:52 AM Amit Singh Tomar <amitsinght@marvell.com> wrote:
>>
>>> 2) Second approach that we discussed internally is to have schemata for CPOR, and PPART separated by new line as mentioned/suggested by Peter, But it may require to tweak
>>>    the ARM MPAM device driver a bit. It was kind of toss-up between 2nd and 3nd approach :), and we went with the 3rd one.
>>>
>>>    L3:0=XXXX
>>>    L3:0=PPART=X
> 
> I'm not sure having multiple lines for the same resource makes anything
> clearer.  I preferred one of the earlier proposals like this one:
> 
> 	L3:0=XXXX,PPART=X,CCAP=X;1=YYYY,CCAP=Y

This assumes that all tools (public and private) that currently parse the schemata
file will be able to handle this additional information seamlessly.

Reinette
Tony Luck Jan. 11, 2024, 11:14 p.m. UTC | #17
>> I'm not sure having multiple lines for the same resource makes anything
>> clearer.  I preferred one of the earlier proposals like this one:
>> 
>> 	L3:0=XXXX,PPART=X,CCAP=X;1=YYYY,CCAP=Y
>
> This assumes that all tools (public and private) that currently parse the schemata
> file will be able to handle this additional information seamlessly.

Reinette,

Yes. If there are tools that *read* schemata files, they will be surprised by this extra information.

But that also applies if the "extra" information is moved to a second line that also begins with "L3:".

Tools that *write* schemata files should be OK as long as the kernel will still accept:

  # echo "L3:1=fff" > schemata

E.g. the Linux selftests in tools/testing/selftests/resctrl/ should still run without
any modification.

The "separate line" option could work if the prefix isn't "L3:".  E.g.

L3:0=XXXX;1=YYYY
L3PPART:0=X
L3CCAP:0=X;1=Y

If these options are asymmetrically available on cache instances, these extra
lines won't have every L3 cache instance listed.

-Tony
Reinette Chatre Jan. 11, 2024, 11:31 p.m. UTC | #18
Hi Tony,

On 1/11/2024 3:14 PM, Luck, Tony wrote:
>>> I'm not sure having multiple lines for the same resource makes anything
>>> clearer.  I preferred one of the earlier proposals like this one:
>>>
>>> 	L3:0=XXXX,PPART=X,CCAP=X;1=YYYY,CCAP=Y
>>
>> This assumes that all tools (public and private) that currently parse the schemata
>> file will be able to handle this additional information seamlessly.
> 
> Reinette,
> 
> Yes. If there are tools that *read* schemata files, they will be surprised by this extra information.
> 
> But that also applies if the "extra" information is moved to a second line that also begins with "L3:".
> 
> Tools that *write* schemata files should be OK as long as the kernel will still accept:
> 
>   # echo "L3:1=fff" > schemata
> 
> E.g. the Linux selftests in tools/testing/selftests/resctrl/ should still run without
> any modification.
> 
> The "separate line" option could work if the prefix isn't "L3:".  E.g.
> 
> L3:0=XXXX;1=YYYY
> L3PPART:0=X
> L3CCAP:0=X;1=Y
> 
> If these options are asymmetrically available on cache instances, these extra
> lines won't have every L3 cache instance listed.

I think we are going in circles here. I shared my concern about user space
breakage a while ago [1] in response to your previous proposal and this new proposal
seems to match where this thread ended [2] last year.

Reinette

[1] https://lore.kernel.org/lkml/be51596e-2e62-2fb9-4176-b0b2a2abb1d3@intel.com/
[2] https://lore.kernel.org/lkml/20230901160451.00001f75@Huawei.com/