2026/03/10

Newest at the top

2026-03-10 13:38:09 +0100	<Freakie>	the thing is just that the BDDs themselves are not that large relative to the data allocated during the algorithms
2026-03-10 13:38:04 +0100	<comerijn>	Freakie: When you create a compact region the GC copies a (transitive) data structure from the heap to a separate region, and then during GC we either free the entire region at once or only copy the pointer to the region to the new heap. Reducing GC cost from however large the data to 1 pointer
2026-03-10 13:36:56 +0100	<comerijn>	Freakie: Ok, so you have some tree data structure scattered across your heap, right? Every GC you copy that entire thing to a new heap, then wipe the old. If that tree is huge and you keep it around a long time, that's a lot of continual copying
2026-03-10 13:36:52 +0100	<Freakie>	I think it wouldn't help much though because almost all of the allocations in my program comes manipulating the priority queue (regardless of the implementation)
2026-03-10 13:36:00 +0100	<Freakie>	so basically you mean pinning the BDDs while traversing them?
2026-03-10 13:35:31 +0100	<comerijn>	Freakie: Where compact regions help is if you have a large BDD that you keep around indefinitely, by shoving it off into a compact region it basically no longer counts as live data for the GC scaling
2026-03-10 13:35:31 +0100	<Freakie>	because the children can be anywhere on a deeper level
2026-03-10 13:35:17 +0100	<Freakie>	the point of the priority queue is to extract all relevant data from a particular node before progressing past it, but there's no neat structure in terms of compactness
2026-03-10 13:34:19 +0100	<Freakie>	other times it's just traversal for e.g. satcount
2026-03-10 13:34:10 +0100	<Freakie>	a separate algorithm then builds up the BDD again
2026-03-10 13:33:57 +0100	<Freakie>	the apply algorithm takes two bdds (lists) and outputs lists of arcs that make up an unreduced bdd
2026-03-10 13:33:35 +0100	<Freakie>	yes and no? depends on the algorithm
2026-03-10 13:33:16 +0100	<comerijn>	Freakie: So you're incrementally building the representation of a BDD?
2026-03-10 13:32:44 +0100	<Freakie>	so the priority queue gives the most relevant information based on where in the list I am
2026-03-10 13:32:35 +0100	<comerijn>	Freakie: The point of regions is that you have, say 500 MB of data that is live for your entire algorithm then you reduce the GC cost of that data from "copy 500 MB" to "copy a pointer to the compact region"
2026-03-10 13:32:08 +0100	<Freakie>	yeah so I have a list representation of the BDD nodes that I need to either work on top down or bottom up; in either case I need to get as much information as possible from the input lists before proceeding to the tail
2026-03-10 13:31:20 +0100	<comerijn>	Yeah
2026-03-10 13:31:03 +0100	<Freakie>	are you familiar with binary decision diagrams? just for the sake of terminology
2026-03-10 13:30:47 +0100	<Freakie>	I'm a little unsure how compact regions might help since other than what I just mentioned I don't think there are a lot of options to separate the regions during the iterations of the algorithm itself
2026-03-10 13:30:44 +0100	<comerijn>	What are you putting in the queue?
2026-03-10 13:30:12 +0100	<Freakie>	which shouldn't help asymptotically but should reduce the dependency on hte priority queue
2026-03-10 13:29:34 +0100	<comerijn>	Freakie: this blog https://www.channable.com/tech/lessons-in-managing-haskell-memory
2026-03-10 13:29:33 +0100	<Freakie>	I'm planning on restructuring the code so that it can take use of the layout of the lists I'm working with
2026-03-10 13:29:03 +0100	<comerijn>	hmm
2026-03-10 13:29:00 +0100	<Freakie>	for both of those
2026-03-10 13:28:58 +0100	<Freakie>	it didn't seem to have much of an impact when I tried but I could try again
2026-03-10 13:28:53 +0100	<comerijn>	Might wanna try tweaking the number of generations too?
2026-03-10 13:28:31 +0100	<comerijn>	Freakie: Try the default GC with the -qg runtime option?
2026-03-10 13:28:28 +0100	<Freakie>	it's an acceptable time loss compared to being able to actually scale the problem (if it doesn't get worse for even greater input sizes anyway)
2026-03-10 13:28:10 +0100	<Freakie>	I mean yeah productivity suffered technically but the runtime was about the same
2026-03-10 13:27:47 +0100	<comerijn>	But in terms of productivity the default GC is better
2026-03-10 13:27:47 +0100	<Freakie>	nonmoving-gc roughly halved residency and total memory in use for the largest input size I've tested
2026-03-10 13:27:29 +0100	<comerijn>	The main downside of the moving GC is that it causes longer GC pauses (bad for latency sensitive/real-time applications)
2026-03-10 13:27:05 +0100	<comerijn>	Freakie: then the regular one
2026-03-10 13:26:57 +0100	<comerijn>	Freakie: nonmoving-gc will generally have lower throughput
2026-03-10 13:26:56 +0100	<Freakie>	no, I thought that would be too difficult to reason
2026-03-10 13:26:42 +0100	<comerijn>	Freakie: Are you running threaded runtime and large number of capabilities?
2026-03-10 13:26:18 +0100	<Freakie>	nursery sizes seemed to have little effect as well
2026-03-10 13:26:05 +0100	<Freakie>	i've played around with concurrency (--nonmoving-gc) and --nonmoving-dense-allocator-count
2026-03-10 13:25:33 +0100	<Freakie>	in my benchmark
2026-03-10 13:25:29 +0100	<Freakie>	for n = 7 max residency is at about 40 mb while for n = 8 it jumpts to 920 mb
2026-03-10 13:25:06 +0100	<comerijn>	Freakie: Oh, also where you giving any specific GC flags?
2026-03-10 13:24:52 +0100	<comerijn>	Freakie: Compact regions let you dump 4GB into a single compact region and then skip copying it while it's live)
2026-03-10 13:24:46 +0100	<Freakie>	right now my workaround (for memory) is using concurrent GC which slightly increases runtime but significantly helps with memory consumption
2026-03-10 13:24:29 +0100	<comerijn>	Freakie: The problem is that if you have 4GB residency then you're copying 4GB each GC
2026-03-10 13:24:13 +0100	<Freakie>	I've done lots of small optimizations since
2026-03-10 13:24:10 +0100	<comerijn>	Freakie: GHC does a copy&collect (so all live data is copied into a new heap, then we just wipe the old heap)
2026-03-10 13:24:09 +0100	<Freakie>	I already had compactness in mind but the one time I tried doing it it, it seemed to have no effect
2026-03-10 13:23:49 +0100	<comerijn>	Freakie: What c_wraith mentioned might help (compact regions) those let you treat large chunks of life data as a single pointer, which will speed up GC considerably
2026-03-10 13:23:18 +0100	<comerijn>	I'll have to see if I can find it