Message ID | 20241204134410.1161769-1-dev@lankhorst.se (mailing list archive) |
---|---|
Headers | show |
Series | kernel/cgroups: Add "dmem" memory accounting cgroup. | expand |
Hi, On 04.12.24 14:44, Maarten Lankhorst wrote: > New update. Instead of calling it the 'dev' cgroup, it's now the 'dmem' cgroup. Thanks! I think this version looks pretty good. > > Because it only deals with memory regions, the UAPI has been updated to use dmem.min/low/max/current, and to make the API cleaner, the names are changed too. > > dmem.current could contain a line like: > "drm/0000:03:00.0/vram0 1073741824" > > But I think using "drm/card0/vram0" instead of PCIID would perhaps be good too. I'm open to changing it to that based on feedback. Agree, allowing userspace to reference DRM devices via "cardN" syntax sounds good. What about other subsystems potentially using dmem cgroups? I'm not familiar with the media subsystem, but I imagine we might be dealing with things like USB devices there? Is something like a "deviceN" possible there as well, or would device IDs look completely different? Regards, Friedrich > > I've created an IGT test for min and max, and found the changes > from Friedrich Vock sent as feedback were needed. > I've integrated those into the first patch. > > Maarten Lankhorst (5): > kernel/cgroup: Add "dmem" memory accounting cgroup > drm/ttm: Handle cgroup based eviction in TTM > drm/xe: Implement cgroup for vram > drm/amdgpu: Add cgroups implementation > drm/xe: Hack to test with mapped pages instead of vram. > > Maxime Ripard (2): > drm/drv: Add drmm managed registration helper for dmem cgroups. > drm/gem: Add cgroup memory accounting for VRAM helper. > > Documentation/admin-guide/cgroup-v2.rst | 58 +- > Documentation/core-api/cgroup.rst | 9 + > Documentation/core-api/index.rst | 1 + > Documentation/gpu/drm-compute.rst | 54 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 4 + > drivers/gpu/drm/drm_drv.c | 32 + > drivers/gpu/drm/drm_gem_vram_helper.c | 15 +- > drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 18 +- > .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 4 +- > drivers/gpu/drm/ttm/tests/ttm_resource_test.c | 2 +- > drivers/gpu/drm/ttm/ttm_bo.c | 54 +- > drivers/gpu/drm/ttm/ttm_resource.c | 23 +- > drivers/gpu/drm/xe/xe_ttm_sys_mgr.c | 5 + > drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 8 + > include/drm/drm_drv.h | 5 + > include/drm/ttm/ttm_resource.h | 12 +- > include/linux/cgroup_dmem.h | 67 ++ > include/linux/cgroup_subsys.h | 4 + > include/linux/page_counter.h | 2 +- > init/Kconfig | 10 + > kernel/cgroup/Makefile | 1 + > kernel/cgroup/dmem.c | 861 ++++++++++++++++++ > mm/page_counter.c | 4 +- > 23 files changed, 1219 insertions(+), 34 deletions(-) > create mode 100644 Documentation/core-api/cgroup.rst > create mode 100644 Documentation/gpu/drm-compute.rst > create mode 100644 include/linux/cgroup_dmem.h > create mode 100644 kernel/cgroup/dmem.c >
Hi, Thanks for the new update! On Wed, Dec 04, 2024 at 02:44:00PM +0100, Maarten Lankhorst wrote: > New update. Instead of calling it the 'dev' cgroup, it's now the > 'dmem' cgroup. > > Because it only deals with memory regions, the UAPI has been updated > to use dmem.min/low/max/current, and to make the API cleaner, the > names are changed too. The API is much nicer, and fits much better into other frameworks too. > dmem.current could contain a line like: > "drm/0000:03:00.0/vram0 1073741824" > > But I think using "drm/card0/vram0" instead of PCIID would perhaps be > good too. I'm open to changing it to that based on feedback. Do we have any sort of guarantee over the name card0 being stable across reboots? I also wonder if we should have a "total" device that limits the amount of memory we can allocate from any region? Maxime
On Sun, Dec 08, 2024 at 01:15:34PM +0100, Friedrich Vock wrote: > Hi, > > On 04.12.24 14:44, Maarten Lankhorst wrote: > > > Because it only deals with memory regions, the UAPI has been updated > > to use dmem.min/low/max/current, and to make the API cleaner, the > > names are changed too. > > > > dmem.current could contain a line like: > > "drm/0000:03:00.0/vram0 1073741824" > > > > But I think using "drm/card0/vram0" instead of PCIID would perhaps > > be good too. I'm open to changing it to that based on feedback. > > Agree, allowing userspace to reference DRM devices via "cardN" syntax > sounds good. > > What about other subsystems potentially using dmem cgroups? > I'm not familiar with the media subsystem, but I imagine we might be > dealing with things like USB devices there? Is something like a > "deviceN" possible there as well, or would device IDs look completely > different? I have some patches to enable the cgroup in GEM-based drivers, media ones and dma-buf heaps. The dma-buf heaps are simple enough since the heaps names are supposed to be stable. I don't think using card0 vs card1 (or v4l0 vs v4l1 for example) will work because I don't think we have any sort of guarantee that these names will always point to the same devices across reboots or updates. If the module is loaded later than it used to for example, we could very well end up in a situation where card0 and card1 are swapped, while the constraints apply to the previous situation. Maxime
Hey, Den 2024-12-13 kl. 14:07, skrev Maxime Ripard: > On Sun, Dec 08, 2024 at 01:15:34PM +0100, Friedrich Vock wrote: >> Hi, >> >> On 04.12.24 14:44, Maarten Lankhorst wrote: >> >>> Because it only deals with memory regions, the UAPI has been updated >>> to use dmem.min/low/max/current, and to make the API cleaner, the >>> names are changed too. >>> >>> dmem.current could contain a line like: >>> "drm/0000:03:00.0/vram0 1073741824" >>> >>> But I think using "drm/card0/vram0" instead of PCIID would perhaps >>> be good too. I'm open to changing it to that based on feedback. >> >> Agree, allowing userspace to reference DRM devices via "cardN" syntax >> sounds good. >> >> What about other subsystems potentially using dmem cgroups? >> I'm not familiar with the media subsystem, but I imagine we might be >> dealing with things like USB devices there? Is something like a >> "deviceN" possible there as well, or would device IDs look completely >> different? I'd just take what makes sense for each driver. dev_name() would be a good approximation. I agree that cardN is not stable. > > I have some patches to enable the cgroup in GEM-based drivers, media > ones and dma-buf heaps. The dma-buf heaps are simple enough since the > heaps names are supposed to be stable. I've used your patch as a base enable cgroup in drivers that use the VRAM manager. I didn't want to enable it for all of GEM, because it would conflict with drivers using TTM. Some more discussion is needed first. For DMA-BUF heaps, I think it's fine and there is a lot less need of discussion. I just felt it should be sent separately from the initial enablement. > I don't think using card0 vs card1 (or v4l0 vs v4l1 for example) will > work because I don't think we have any sort of guarantee that these > names will always point to the same devices across reboots or updates. > > If the module is loaded later than it used to for example, we could very > well end up in a situation where card0 and card1 are swapped, while the > constraints apply to the previous situation. I agree, just put it out there for discussion. I don't think the benefits weigh up against the downsides :-) Cheers, ~Maarten
Den 2024-12-13 kl. 14:03, skrev Maxime Ripard: > Hi, > > Thanks for the new update! > > On Wed, Dec 04, 2024 at 02:44:00PM +0100, Maarten Lankhorst wrote: >> New update. Instead of calling it the 'dev' cgroup, it's now the >> 'dmem' cgroup. >> >> Because it only deals with memory regions, the UAPI has been updated >> to use dmem.min/low/max/current, and to make the API cleaner, the >> names are changed too. > > The API is much nicer, and fits much better into other frameworks too. > >> dmem.current could contain a line like: >> "drm/0000:03:00.0/vram0 1073741824" >> >> But I think using "drm/card0/vram0" instead of PCIID would perhaps be >> good too. I'm open to changing it to that based on feedback. > > Do we have any sort of guarantee over the name card0 being stable across > reboots? > > I also wonder if we should have a "total" device that limits the amount > of memory we can allocate from any region? I don't think it is useful. Say your app can use 1 GB of main memory or 2 GB of VRAM, it wouldn't make sense to limit the total of those. In a lot of cases there is only 1 region, so the total of that would still be the same. On top, we just separated the management of each region, adding a 'total' would require unseparating it again. :-) I'm happy with this version I think. I don't think more changes for the base are needed. Cheers, ~Maarten
On Fri, Dec 13, 2024 at 03:53:13PM +0100, Maarten Lankhorst wrote: > > > Den 2024-12-13 kl. 14:03, skrev Maxime Ripard: > > Hi, > > > > Thanks for the new update! > > > > On Wed, Dec 04, 2024 at 02:44:00PM +0100, Maarten Lankhorst wrote: > > > New update. Instead of calling it the 'dev' cgroup, it's now the > > > 'dmem' cgroup. > > > > > > Because it only deals with memory regions, the UAPI has been updated > > > to use dmem.min/low/max/current, and to make the API cleaner, the > > > names are changed too. > > > > The API is much nicer, and fits much better into other frameworks too. > > > > > dmem.current could contain a line like: > > > "drm/0000:03:00.0/vram0 1073741824" > > > > > > But I think using "drm/card0/vram0" instead of PCIID would perhaps be > > > good too. I'm open to changing it to that based on feedback. > > > > Do we have any sort of guarantee over the name card0 being stable across > > reboots? > > > > I also wonder if we should have a "total" device that limits the amount > > of memory we can allocate from any region? > > I don't think it is useful. Say your app can use 1 GB of main memory or 2 GB > of VRAM, it wouldn't make sense to limit the total of those. In a lot of > cases there is only 1 region, so the total of that would still be the same. > > On top, we just separated the management of each region, adding a 'total' > would require unseparating it again. :-) I didn't mean the total for a device, but for the system. It would definitely not make sense for a VRAM, but for CMA for example, you have a single, limited, allocator that will be accessible from heaps, v4l2 and DRM devices. If an application has to allocate both from v4l2 and DRM buffers, we should be able to limit its total usage of CMA, not just on a single device. Maxime
On Fri, Dec 13, 2024 at 03:13:23PM +0100, Maarten Lankhorst wrote: > Hey, > > Den 2024-12-13 kl. 14:07, skrev Maxime Ripard: > > On Sun, Dec 08, 2024 at 01:15:34PM +0100, Friedrich Vock wrote: > > > Hi, > > > > > > On 04.12.24 14:44, Maarten Lankhorst wrote: > > > > > > > Because it only deals with memory regions, the UAPI has been updated > > > > to use dmem.min/low/max/current, and to make the API cleaner, the > > > > names are changed too. > > > > > > > > dmem.current could contain a line like: > > > > "drm/0000:03:00.0/vram0 1073741824" > > > > > > > > But I think using "drm/card0/vram0" instead of PCIID would perhaps > > > > be good too. I'm open to changing it to that based on feedback. > > > > > > Agree, allowing userspace to reference DRM devices via "cardN" syntax > > > sounds good. > > > > > > What about other subsystems potentially using dmem cgroups? > > > I'm not familiar with the media subsystem, but I imagine we might be > > > dealing with things like USB devices there? Is something like a > > > "deviceN" possible there as well, or would device IDs look completely > > > different? > > I'd just take what makes sense for each driver. dev_name() would be a good > approximation. Yeah, dev_name() seems good enough to me too. > I agree that cardN is not stable. > > > > I have some patches to enable the cgroup in GEM-based drivers, media > > ones and dma-buf heaps. The dma-buf heaps are simple enough since the > > heaps names are supposed to be stable. > > I've used your patch as a base enable cgroup in drivers that use the VRAM > manager. I didn't want to enable it for all of GEM, because it would > conflict with drivers using TTM. Some more discussion is needed first. > > For DMA-BUF heaps, I think it's fine and there is a lot less need of > discussion. I just felt it should be sent separately from the initial > enablement. Definitely. > > I don't think using card0 vs card1 (or v4l0 vs v4l1 for example) will > > work because I don't think we have any sort of guarantee that these > > names will always point to the same devices across reboots or updates. > > > > If the module is loaded later than it used to for example, we could very > > well end up in a situation where card0 and card1 are swapped, while the > > constraints apply to the previous situation. > > I agree, just put it out there for discussion. I don't think the benefits > weigh up against the downsides :-) Oh absolutely. The way to define a stable name is going to be framework specific anyway. My point was that we wanted to have a stable name. Maxime
Hey, Den 2024-12-13 kl. 16:21, skrev Maxime Ripard: > On Fri, Dec 13, 2024 at 03:53:13PM +0100, Maarten Lankhorst wrote: >> >> >> Den 2024-12-13 kl. 14:03, skrev Maxime Ripard: >>> Hi, >>> >>> Thanks for the new update! >>> >>> On Wed, Dec 04, 2024 at 02:44:00PM +0100, Maarten Lankhorst wrote: >>>> New update. Instead of calling it the 'dev' cgroup, it's now the >>>> 'dmem' cgroup. >>>> >>>> Because it only deals with memory regions, the UAPI has been updated >>>> to use dmem.min/low/max/current, and to make the API cleaner, the >>>> names are changed too. >>> >>> The API is much nicer, and fits much better into other frameworks too. >>> >>>> dmem.current could contain a line like: >>>> "drm/0000:03:00.0/vram0 1073741824" >>>> >>>> But I think using "drm/card0/vram0" instead of PCIID would perhaps be >>>> good too. I'm open to changing it to that based on feedback. >>> >>> Do we have any sort of guarantee over the name card0 being stable across >>> reboots? >>> >>> I also wonder if we should have a "total" device that limits the amount >>> of memory we can allocate from any region? >> >> I don't think it is useful. Say your app can use 1 GB of main memory or 2 GB >> of VRAM, it wouldn't make sense to limit the total of those. In a lot of >> cases there is only 1 region, so the total of that would still be the same. >> >> On top, we just separated the management of each region, adding a 'total' >> would require unseparating it again. :-) > > I didn't mean the total for a device, but for the system. It would > definitely not make sense for a VRAM, but for CMA for example, you have > a single, limited, allocator that will be accessible from heaps, v4l2 > and DRM devices. > > If an application has to allocate both from v4l2 and DRM buffers, we > should be able to limit its total usage of CMA, not just on a single > device. In this case, I think it makes more sense if CMA creates a region, then use that region in both v4l2 and DRM instead of a separate region for both, with CMA being responsible for lifetime. Cheers, ~Maarten
On Fri, Dec 13, 2024 at 05:06:05PM +0100, Maarten Lankhorst wrote: > Hey, > > Den 2024-12-13 kl. 16:21, skrev Maxime Ripard: > > On Fri, Dec 13, 2024 at 03:53:13PM +0100, Maarten Lankhorst wrote: > > > > > > > > > Den 2024-12-13 kl. 14:03, skrev Maxime Ripard: > > > > Hi, > > > > > > > > Thanks for the new update! > > > > > > > > On Wed, Dec 04, 2024 at 02:44:00PM +0100, Maarten Lankhorst wrote: > > > > > New update. Instead of calling it the 'dev' cgroup, it's now the > > > > > 'dmem' cgroup. > > > > > > > > > > Because it only deals with memory regions, the UAPI has been updated > > > > > to use dmem.min/low/max/current, and to make the API cleaner, the > > > > > names are changed too. > > > > > > > > The API is much nicer, and fits much better into other frameworks too. > > > > > > > > > dmem.current could contain a line like: > > > > > "drm/0000:03:00.0/vram0 1073741824" > > > > > > > > > > But I think using "drm/card0/vram0" instead of PCIID would perhaps be > > > > > good too. I'm open to changing it to that based on feedback. > > > > > > > > Do we have any sort of guarantee over the name card0 being stable across > > > > reboots? > > > > > > > > I also wonder if we should have a "total" device that limits the amount > > > > of memory we can allocate from any region? > > > > > > I don't think it is useful. Say your app can use 1 GB of main memory or 2 GB > > > of VRAM, it wouldn't make sense to limit the total of those. In a lot of > > > cases there is only 1 region, so the total of that would still be the same. > > > > > > On top, we just separated the management of each region, adding a 'total' > > > would require unseparating it again. :-) > > > > I didn't mean the total for a device, but for the system. It would > > definitely not make sense for a VRAM, but for CMA for example, you have > > a single, limited, allocator that will be accessible from heaps, v4l2 > > and DRM devices. > > > > If an application has to allocate both from v4l2 and DRM buffers, we > > should be able to limit its total usage of CMA, not just on a single > > device. > > In this case, I think it makes more sense if CMA creates a region, then use > that region in both v4l2 and DRM instead of a separate region for both, with > CMA being responsible for lifetime. Ack, thanks for your feedback :) Maxime
Hey, Now that all patches look good, what is needed to merge the series? Without patch 6/7 as it is a hack for testing. I've also posted a IGT for verifying read/write works (rule out copy/paste errors) and min, max semantics work as intended. https://lists.freedesktop.org/archives/igt-dev/2024-December/083345.html Cheers, ~Maarten Den 2024-12-17 kl. 08:46, skrev Maxime Ripard: > On Fri, Dec 13, 2024 at 05:06:05PM +0100, Maarten Lankhorst wrote: >> Hey, >> >> Den 2024-12-13 kl. 16:21, skrev Maxime Ripard: >>> On Fri, Dec 13, 2024 at 03:53:13PM +0100, Maarten Lankhorst wrote: >>>> >>>> >>>> Den 2024-12-13 kl. 14:03, skrev Maxime Ripard: >>>>> Hi,l >>>>> >>>>> Thanks for the new update! >>>>> >>>>> On Wed, Dec 04, 2024 at 02:44:00PM +0100, Maarten Lankhorst wrote: >>>>>> New update. Instead of calling it the 'dev' cgroup, it's now the >>>>>> 'dmem' cgroup. >>>>>> >>>>>> Because it only deals with memory regions, the UAPI has been updated >>>>>> to use dmem.min/low/max/current, and to make the API cleaner, the >>>>>> names are changed too. >>>>> >>>>> The API is much nicer, and fits much better into other frameworks too. >>>>> >>>>>> dmem.current could contain a line like: >>>>>> "drm/0000:03:00.0/vram0 1073741824" >>>>>> >>>>>> But I think using "drm/card0/vram0" instead of PCIID would perhaps be >>>>>> good too. I'm open to changing it to that based on feedback. >>>>> >>>>> Do we have any sort of guarantee over the name card0 being stable across >>>>> reboots? >>>>> >>>>> I also wonder if we should have a "total" device that limits the amount >>>>> of memory we can allocate from any region? >>>> >>>> I don't think it is useful. Say your app can use 1 GB of main memory or 2 GB >>>> of VRAM, it wouldn't make sense to limit the total of those. In a lot of >>>> cases there is only 1 region, so the total of that would still be the same. >>>> >>>> On top, we just separated the management of each region, adding a 'total' >>>> would require unseparating it again. :-) >>> >>> I didn't mean the total for a device, but for the system. It would >>> definitely not make sense for a VRAM, but for CMA for example, you have >>> a single, limited, allocator that will be accessible from heaps, v4l2 >>> and DRM devices. >>> >>> If an application has to allocate both from v4l2 and DRM buffers, we >>> should be able to limit its total usage of CMA, not just on a single >>> device. >> >> In this case, I think it makes more sense if CMA creates a region, then use >> that region in both v4l2 and DRM instead of a separate region for both, with >> CMA being responsible for lifetime. > > Ack, thanks for your feedback :) > > Maxime
On Tue, Dec 17, 2024 at 03:28:50PM +0100, Maarten Lankhorst wrote: > Now that all patches look good, what is needed to merge the series? Without > patch 6/7 as it is a hack for testing. There were some questions raised about device naming. One thing we want to get right from the beginning is the basic interface. Thanks.
On Tue, Dec 17, 2024 at 07:11:00AM -1000, Tejun Heo wrote: > On Tue, Dec 17, 2024 at 03:28:50PM +0100, Maarten Lankhorst wrote: > > Now that all patches look good, what is needed to merge the series? Without > > patch 6/7 as it is a hack for testing. > > There were some questions raised about device naming. One thing we want to > get right from the beginning is the basic interface. We decided on the previous version to use dmem and I believe this version has switched to it. Do you have any concern still? Maxime
Den 2024-12-17 kl. 18:11, skrev Tejun Heo: > On Tue, Dec 17, 2024 at 03:28:50PM +0100, Maarten Lankhorst wrote: >> Now that all patches look good, what is needed to merge the series? Without >> patch 6/7 as it is a hack for testing. > > There were some questions raised about device naming. One thing we want to > get right from the beginning is the basic interface. > > Thanks. > I believe it was solved. The conclusion appears to be that we go with how we defined it in this series. drm/$pciid/$regionname. With the only regions defined now being VRAM. Main memory will be a followup, but requires some discussions on hwo to be prevent double accounting, and what to do with the limited amount of mappable memory. Cheers, ~Maarten
Hello, On Tue, Dec 17, 2024 at 06:37:22PM +0100, Maarten Lankhorst wrote: > Den 2024-12-17 kl. 18:11, skrev Tejun Heo: > > On Tue, Dec 17, 2024 at 03:28:50PM +0100, Maarten Lankhorst wrote: > > > Now that all patches look good, what is needed to merge the series? Without > > > patch 6/7 as it is a hack for testing. > > > > There were some questions raised about device naming. One thing we want to > > get right from the beginning is the basic interface. > > > > Thanks. > > > I believe it was solved. The conclusion appears to be that we go with how we > defined it in this series. drm/$pciid/$regionname. With the only regions > defined now being VRAM. Main memory will be a followup, but requires some > discussions on hwo to be prevent double accounting, and what to do with the > limited amount of mappable memory. Provided Johannes is okay with the series, how do you want to route the series? If you want to route it through drm, that's fine by me and please feel free to add: Acked-by: Tejun Heo <tj@kernel.org> Thanks.
Hey, Den 2024-12-17 kl. 19:23, skrev Tejun Heo: > Hello, > > On Tue, Dec 17, 2024 at 06:37:22PM +0100, Maarten Lankhorst wrote: >> Den 2024-12-17 kl. 18:11, skrev Tejun Heo: >>> On Tue, Dec 17, 2024 at 03:28:50PM +0100, Maarten Lankhorst wrote: >>>> Now that all patches look good, what is needed to merge the series? Without >>>> patch 6/7 as it is a hack for testing. >>> >>> There were some questions raised about device naming. One thing we want to >>> get right from the beginning is the basic interface. >>> >>> Thanks. >>> >> I believe it was solved. The conclusion appears to be that we go with how we >> defined it in this series. drm/$pciid/$regionname. With the only regions >> defined now being VRAM. Main memory will be a followup, but requires some >> discussions on hwo to be prevent double accounting, and what to do with the >> limited amount of mappable memory. > > Provided Johannes is okay with the series, how do you want to route the > series? If you want to route it through drm, that's fine by me and please > feel free to add: > > Acked-by: Tejun Heo <tj@kernel.org> > > Thanks. > Thank you! I've discussed this with the DRM maintainers. What was suggested is to create a topic branch, merge it to drm-misc whichwhere it will be picked up into drm.git during the next pull request. At the same time the topic branch can be also be merged into the cgroup tree if needed. The drm-misc tree already handles dma-buf and fbdev core, think DMEM could fit in there too. Cheers, ~Maarten
On 17.12.24 18:37, Maarten Lankhorst wrote: > > > Den 2024-12-17 kl. 18:11, skrev Tejun Heo: >> On Tue, Dec 17, 2024 at 03:28:50PM +0100, Maarten Lankhorst wrote: >>> Now that all patches look good, what is needed to merge the series? >>> Without >>> patch 6/7 as it is a hack for testing. >> >> There were some questions raised about device naming. One thing we >> want to >> get right from the beginning is the basic interface. >> >> Thanks. >> > I believe it was solved. The conclusion appears to be that we go with > how we defined it in this series. drm/$pciid/$regionname. Yeah, I'd agree. Using PCI IDs works, and the objection about cardN syntax being unstable when driver load order changes is a good point. Friedrich > With the only > regions defined now being VRAM. Main memory will be a followup, but > requires some discussions on hwo to be prevent double accounting, and > what to do with the limited amount of mappable memory. > > Cheers, > ~Maarten