Newest at the top
| 2026-03-10 13:34:19 +0100 | <Freakie> | other times it's just traversal for e.g. satcount |
| 2026-03-10 13:34:10 +0100 | <Freakie> | a separate algorithm then builds up the BDD again |
| 2026-03-10 13:33:57 +0100 | <Freakie> | the apply algorithm takes two bdds (lists) and outputs lists of arcs that make up an unreduced bdd |
| 2026-03-10 13:33:35 +0100 | <Freakie> | yes and no? depends on the algorithm |
| 2026-03-10 13:33:16 +0100 | <comerijn> | Freakie: So you're incrementally building the representation of a BDD? |
| 2026-03-10 13:32:44 +0100 | <Freakie> | so the priority queue gives the most relevant information based on where in the list I am |
| 2026-03-10 13:32:35 +0100 | <comerijn> | Freakie: The point of regions is that you have, say 500 MB of data that is live for your entire algorithm then you reduce the GC cost of that data from "copy 500 MB" to "copy a pointer to the compact region" |
| 2026-03-10 13:32:08 +0100 | <Freakie> | yeah so I have a list representation of the BDD nodes that I need to either work on top down or bottom up; in either case I need to get as much information as possible from the input lists before proceeding to the tail |
| 2026-03-10 13:31:20 +0100 | <comerijn> | Yeah |
| 2026-03-10 13:31:03 +0100 | <Freakie> | are you familiar with binary decision diagrams? just for the sake of terminology |
| 2026-03-10 13:30:47 +0100 | <Freakie> | I'm a little unsure how compact regions might help since other than what I just mentioned I don't think there are a lot of options to separate the regions during the iterations of the algorithm itself |
| 2026-03-10 13:30:44 +0100 | <comerijn> | What are you putting in the queue? |
| 2026-03-10 13:30:12 +0100 | <Freakie> | which shouldn't help asymptotically but should reduce the dependency on hte priority queue |
| 2026-03-10 13:29:34 +0100 | <comerijn> | Freakie: this blog https://www.channable.com/tech/lessons-in-managing-haskell-memory |
| 2026-03-10 13:29:33 +0100 | <Freakie> | I'm planning on restructuring the code so that it can take use of the layout of the lists I'm working with |
| 2026-03-10 13:29:03 +0100 | <comerijn> | hmm |
| 2026-03-10 13:29:00 +0100 | <Freakie> | for both of those |
| 2026-03-10 13:28:58 +0100 | <Freakie> | it didn't seem to have much of an impact when I tried but I could try again |
| 2026-03-10 13:28:53 +0100 | <comerijn> | Might wanna try tweaking the number of generations too? |
| 2026-03-10 13:28:31 +0100 | <comerijn> | Freakie: Try the default GC with the -qg runtime option? |
| 2026-03-10 13:28:28 +0100 | <Freakie> | it's an acceptable time loss compared to being able to actually scale the problem (if it doesn't get worse for even greater input sizes anyway) |
| 2026-03-10 13:28:10 +0100 | <Freakie> | I mean yeah productivity suffered *technically* but the runtime was about the same |
| 2026-03-10 13:27:47 +0100 | <comerijn> | But in terms of productivity the default GC is better |
| 2026-03-10 13:27:47 +0100 | <Freakie> | nonmoving-gc roughly halved residency and total memory in use for the largest input size I've tested |
| 2026-03-10 13:27:29 +0100 | <comerijn> | The main downside of the moving GC is that it causes longer GC pauses (bad for latency sensitive/real-time applications) |
| 2026-03-10 13:27:05 +0100 | <comerijn> | Freakie: then the regular one |
| 2026-03-10 13:26:57 +0100 | <comerijn> | Freakie: nonmoving-gc will generally have lower throughput |
| 2026-03-10 13:26:56 +0100 | <Freakie> | no, I thought that would be too difficult to reason |
| 2026-03-10 13:26:42 +0100 | <comerijn> | Freakie: Are you running threaded runtime and large number of capabilities? |
| 2026-03-10 13:26:18 +0100 | <Freakie> | nursery sizes seemed to have little effect as well |
| 2026-03-10 13:26:05 +0100 | <Freakie> | i've played around with concurrency (--nonmoving-gc) and --nonmoving-dense-allocator-count |
| 2026-03-10 13:25:33 +0100 | <Freakie> | in my benchmark |
| 2026-03-10 13:25:29 +0100 | <Freakie> | for n = 7 max residency is at about 40 mb while for n = 8 it jumpts to 920 mb |
| 2026-03-10 13:25:06 +0100 | <comerijn> | Freakie: Oh, also where you giving any specific GC flags? |
| 2026-03-10 13:24:52 +0100 | <comerijn> | Freakie: Compact regions let you dump 4GB into a single compact region and then skip copying it while it's live) |
| 2026-03-10 13:24:46 +0100 | <Freakie> | right now my workaround (for memory) is using concurrent GC which slightly increases runtime but significantly helps with memory consumption |
| 2026-03-10 13:24:29 +0100 | <comerijn> | Freakie: The problem is that if you have 4GB residency then you're copying 4GB each GC |
| 2026-03-10 13:24:13 +0100 | <Freakie> | I've done lots of small optimizations since |
| 2026-03-10 13:24:10 +0100 | <comerijn> | Freakie: GHC does a copy&collect (so all live data is copied into a new heap, then we just wipe the old heap) |
| 2026-03-10 13:24:09 +0100 | <Freakie> | I already had compactness in mind but the one time I tried doing it it, it seemed to have no effect |
| 2026-03-10 13:23:49 +0100 | <comerijn> | Freakie: What c_wraith mentioned might help (compact regions) those let you treat large chunks of life data as a single pointer, which will speed up GC considerably |
| 2026-03-10 13:23:18 +0100 | <comerijn> | I'll have to see if I can find it |
| 2026-03-10 13:23:09 +0100 | <Freakie> | can you link the post again? |
| 2026-03-10 13:22:59 +0100 | <Leary> | Freakie: In future, check the logs; they're linked in the topic. |
| 2026-03-10 13:22:40 +0100 | <comerijn> | Freakie: That and the fact that laziness does lots of short-lived allocations, so the GC is tuned for fast allocation and short-lived data |
| 2026-03-10 13:22:35 +0100 | <Freakie> | doublechecking residency |
| 2026-03-10 13:22:25 +0100 | <Freakie> | I don't have any metrics on the queue size but it's likely to be quite large in general because of the problem itself I'm benchmarking (encoding a boolean formula for n-queens) |
| 2026-03-10 13:21:43 +0100 | <comerijn> | And my other previous comment: there's a well known blogpost from ages ago of a company struggle with Haskell GC also for a queue and part of their problem (and, I suspect, yours) was that their inherent problem involved a large liveset |
| 2026-03-10 13:21:41 +0100 | <Freakie> | I assume that's linked to pointers being acylic |
| 2026-03-10 13:21:17 +0100 | <comerijn> | Freakie: Therefore big live sets will increase your GC times quite a bit |