Message ID | 20240312094133.2084996-1-max.kellermann@ionos.com (mailing list archive) |
---|---|
Headers | show |
Series | Fast kernel headers: split linux/mm.h | expand |
On 12.03.24 10:41, Max Kellermann wrote: > This patch set aims to clean up the linux/mm.h header and reduce > dependencies on it by moving parts out. > > The goal was to eliminate dependencies on linux/mm.h from other > popular headers such as highmem.h and dma-mapping.h, and I started by > checking which symbols were really used and moved those declarations > to separate slim headers. > [...] > include/linux/mm.h | 583 +----------------- > include/linux/mm/devmap_managed.h | 37 ++ > include/linux/mm/folio_next.h | 27 + Isn't that a bit excessive? Do we really need that many folio headers?
On Tue, Mar 12, 2024 at 5:10 PM David Hildenbrand <david@redhat.com> wrote: > > include/linux/mm.h | 583 +----------------- > > include/linux/mm/devmap_managed.h | 37 ++ > > include/linux/mm/folio_next.h | 27 + > > Isn't that a bit excessive? Do we really need that many folio headers? It would technically be possible to have fewer headers by merging several of those headers back into one, but then the dependencies will be heavier, and eventually we'd be back at the large "mm.h" mess that I would like to get rid of. My patch set constitutes the balance of "not too many headers" vs "small set of unnecessary dependencies for each including source file" according to my taste. Max
On 12.03.24 17:27, Max Kellermann wrote: > On Tue, Mar 12, 2024 at 5:10 PM David Hildenbrand <david@redhat.com> wrote: >>> include/linux/mm.h | 583 +----------------- >>> include/linux/mm/devmap_managed.h | 37 ++ >>> include/linux/mm/folio_next.h | 27 + >> >> Isn't that a bit excessive? Do we really need that many folio headers? > > It would technically be possible to have fewer headers by merging > several of those headers back into one, but then the dependencies will > be heavier, and eventually we'd be back at the large "mm.h" mess that > I would like to get rid of. Just curious: why? Usually build time, do you have some numbers? > My patch set constitutes the balance of "not too many headers" vs > "small set of unnecessary dependencies for each including source file" > according to my taste. I'm not against splitting out stuff. But one function per header is a bit excessive IMHO. Ideally, we'd have some MM guideline that we'll be able to follow in the future. So we don't need "personal taste". For example, if I were to write a folio_prev(), should I put it in include/linux/mm/folio_prev.h ? Likely we'd put it into folio_next.h, but then the name doesn't make any sense. The point I am trying to make: if there was a single folio_ops.h, it would be clearer where it would go. Just my 2 cents, seeing one-function-per-file headers.
On Tue, Mar 12, 2024 at 5:33 PM David Hildenbrand <david@redhat.com> wrote: > Just curious: why? Usually build time, do you have some numbers? (This has been discussed so often and I thought having smaller header dependencies was already established as general consensus among kernel devs, and everybody agreed that mm.h and kernel.h are a big mess that needs cleanup.) Why: Correctness, less namespace pollution, and the least important aspect is reduced build times. However, the gains by this patch set are very small; each optimization gains very little. Some time ago, I posted a much larger patch set with a few numbers: https://lore.kernel.org/lkml/20240131145008.1345531-1-max.kellermann@ionos.com/ - the speedup was measurable, but not amazing, because even after that patch set, everybody still includes everything, and much more cleanup is needed to make a bigger difference. Once the big knots like kernel.h and mm.h are broken up, we will have better results. And as I said: build times are nice, but the lesser advantage. Anyway, this first patch set was so extremely large that nobody was able to review it. So this patch contains just the parts that deal with mm.h; later, when this patch set is merged, I can continue with other headers. (I already have a branch that splits kernel.h and I'll submit it eventually, after the mm.h cleanup.) > I'm not against splitting out stuff. But one function per header is a > bit excessive IMHO. One function per header is certainly not my goal and I agree it's excessive; but folio_next() in its own header made sense, just in this special case, because it allowed removing the mm.h dependency from bio.h. Removing this dependency was the goal, and folio_next.h was the solution for this particular problem. > Ideally, we'd have some MM guideline that we'll be > able to follow in the future. So we don't need "personal taste". Agree. But lacking such a guideline, all I can do is make suggestions and submit patches for review, trying to follow what seemed to be consensus in previous cleanups and what had previously been merged. > For example, if I were to write a folio_prev(), should I put it in > include/linux/mm/folio_prev.h ? Likely we'd put it into folio_next.h, > but then the name doesn't make any sense. True. But since no folio_prev() function exists, the only name that made sense for this header was folio_next.h. If folio_prev() gets added, I'd say put it in that header, but rename it to, let's say, "folio_iterator.h". But right now, with just this one function (and nothing else like it that could be added here), I decided to suggest naming it "folio_next.h". If you think "folio_iterator.h" or something else is better, I'll rename and resubmit; names don't matter so much to me; the general idea of cleaning up the headers is what's important. > The point I am trying to make: if there was a single folio_ops.h, it > would be clearer where it would go. Maybe, but IMO this wouldn't be a good place to add folio_next(), because folio_next() doesn't "operate" on a folio, but on a folio pointer (or "iterator"). Again, names don't matter to me - ideas/concepts matter. I'm only explaining why I decided to submit my patches that way, but I'll change them any way you people want. Max
On 12.03.24 19:11, Max Kellermann wrote: > On Tue, Mar 12, 2024 at 5:33 PM David Hildenbrand <david@redhat.com> wrote: >> Just curious: why? Usually build time, do you have some numbers? > > (This has been discussed so often and I thought having smaller header > dependencies was already established as general consensus among kernel > devs, and everybody agreed that mm.h and kernel.h are a big mess that > needs cleanup.) It's always a good idea to include at least some sentences about history+motivation in the cover letter. > > Why: Correctness, less namespace pollution, and the least important > aspect is reduced build times. However, the gains by this patch set > are very small; each optimization gains very little. > > Some time ago, I posted a much larger patch set with a few numbers: > https://lore.kernel.org/lkml/20240131145008.1345531-1-max.kellermann@ionos.com/ > - the speedup was measurable, but not amazing, because even after that > patch set, everybody still includes everything, and much more cleanup > is needed to make a bigger difference. Once the big knots like > kernel.h and mm.h are broken up, we will have better results. And as I > said: build times are nice, but the lesser advantage. Okay, so "Fast kernel headers:" is a bit misleading :P I'm asking these stupid questions because I remember some header splitup (from Christoph?) that really did make a significant build-time difference. Reading all that, I would have written something like this in the cover letter: "There is an agreement that we want to split up some of the big kernel headers, such as kernel.h and mm.h. While not delivering immediate build-time gains, it's one step into the desired direction. Besides build-time, smaller headers promise less namespace pollution and correctness." (although I don't immediately understand what correctness means in this context) > > Anyway, this first patch set was so extremely large that nobody was > able to review it. So this patch contains just the parts that deal > with mm.h; later, when this patch set is merged, I can continue with > other headers. (I already have a branch that splits kernel.h and I'll > submit it eventually, after the mm.h cleanup.) I'd have continued with "Some of these changes were part of a previous, bigger patchset [1]. I'm now sending out smaller, reviewable pieces. Splitting up mm.h is the first step." [1] https://lore.kernel.org/lkml/20240131145008.1345531-1-max.kellermann@ionos.com/ > >> I'm not against splitting out stuff. But one function per header is a >> bit excessive IMHO. > > One function per header is certainly not my goal and I agree it's > excessive; but folio_next() in its own header made sense, just in this > special case, because it allowed removing the mm.h dependency from > bio.h. Removing this dependency was the goal, and folio_next.h was the > solution for this particular problem. Okay, I see. I still wonder if merging some of the new folio headers in "folio.h" wouldn't have achieved something close to that. Sure, they bring in some other dependencies, but hopefully not mm.h Anyhow, what you are describing here would also have been reasonable to describe somewhere in few words. IOW, which approach did you chose to decide when a new header was reasonable. > >> Ideally, we'd have some MM guideline that we'll be >> able to follow in the future. So we don't need "personal taste". > > Agree. But lacking such a guideline, all I can do is make suggestions > and submit patches for review, trying to follow what seemed to be > consensus in previous cleanups and what had previously been merged. Possibly it makes sense to have some rough guideline. The problem I see is that once your stuff is merged, it will start getting messed up again over time. But maybe that's just the way it works. > >> For example, if I were to write a folio_prev(), should I put it in >> include/linux/mm/folio_prev.h ? Likely we'd put it into folio_next.h, >> but then the name doesn't make any sense. > > True. But since no folio_prev() function exists, the only name that > made sense for this header was folio_next.h. If folio_prev() gets > added, I'd say put it in that header, but rename it to, let's say, > "folio_iterator.h". But right now, with just this one function (and > nothing else like it that could be added here), I decided to suggest > naming it "folio_next.h". If you think "folio_iterator.h" or something > else is better, I'll rename and resubmit; names don't matter so much > to me; the general idea of cleaning up the headers is what's > important. > >> The point I am trying to make: if there was a single folio_ops.h, it >> would be clearer where it would go. > > Maybe, but IMO this wouldn't be a good place to add folio_next(), > because folio_next() doesn't "operate" on a folio, but on a folio > pointer (or "iterator"). Again, names don't matter to me - > ideas/concepts matter. I'm only explaining why I decided to submit my > patches that way, but I'll change them any way you people want. Thanks for the details.
On Tue, Mar 12, 2024 at 8:26 PM David Hildenbrand <david@redhat.com> wrote: > It's always a good idea to include at least some sentences about > history+motivation in the cover letter. Agree, thanks for your review and for the suggestions, that was very useful. Once somebody points it out, it's suddenly obvious what was missing in my explanations for others to understand my motivation. I'll add some more text to the cover letter. > Okay, so "Fast kernel headers:" is a bit misleading :P Yeah. It was the name chosen by Ingo Molnar for his header cleanup project, and some of his work was merged several years ago, so I thought this phrase would be familiar and would bring back memories from these discussions. > (although I don't immediately understand what correctness means in this > context) Some sources don't include all the headers they need; but by chance, this happens to work because deep down the include tree, somebody includes kernel.h, mm.h or fs.h and that pulls in everything else; thus incorrect includes are never a visible problem. By reducing includes to the bare minimum, many of those problems are revealed (by producing compiler errors that then have to be fixed). I very much like to write code in a way that the compiler points out some kinds of mistakes - I rather have a build-time error than a silent runtime breakage. Incorrect includes are not just a theoretical problem; imagine all the asm-generic headers that provide fallback definitions for macros that are not defined on some architectures (e.g. asm-generic/cacheflush.h). If, on some sources, the arch-specific header happens to be not included or defines the macro in the wrong header (I've seen it all), you suddenly randomly get the correct arch macro or the generic fallback macro, leading to inconsistencies, hidden ABI breakages that may lead to severe problems that will be hard to debug. One can imagine many more ways that code can break silently at runtime due to incorrect code (that usually works most of the time, but only by chance). Better be explicit about what you really need to include, and don't rely on other headers to indirectly include what you need. Include what you need even if somebody else down the tree had already included it, because that somebody else may change his mind at some point, and then your source shouldn't break. > The problem I see > is that once your stuff is merged, it will start getting messed up again > over time. But maybe that's just the way it works. It is. This is like cleaning your house: the following day, somebody visits you and brings nice presents (feature patches) but has dirty shoes, and then you have to clean the house again... that's the infinite cycle of software development, as we all know it. Max