2025/11/13

Newest at the top

2025-11-13 12:59:23 +0100merijn(~merijn@77.242.116.146) merijn
2025-11-13 12:54:28 +0100 <[exa]> also if you'd get the O(n) worst-case with this one, you'd be equivalently screwed with the hashmaps ("replace your hash")
2025-11-13 12:53:57 +0100 <[exa]> lucabtz: not really, multimaps can have many same keys and you select the whole range
2025-11-13 12:52:24 +0100deptype_(~deptype@2406:b400:3a:73c2:fdd5:1805:7d92:70ad)
2025-11-13 12:52:05 +0100deptype_(~deptype@2406:b400:3a:73c2:8206:7870:138f:c565) (Remote host closed the connection)
2025-11-13 12:42:47 +0100merijn(~merijn@77.242.116.146) (Ping timeout: 256 seconds)
2025-11-13 12:42:01 +0100biberu(~biberu@user/biberu) biberu
2025-11-13 12:36:58 +0100merijn(~merijn@77.242.116.146) merijn
2025-11-13 12:33:41 +0100xff0x(~xff0x@2405:6580:b080:900:7b8c:bddf:6c13:ed0c)
2025-11-13 12:32:15 +0100deptype_(~deptype@2406:b400:3a:73c2:8206:7870:138f:c565)
2025-11-13 12:32:02 +0100deptype_(~deptype@2406:b400:3a:73c2:d739:473:9e2d:bf26) (Remote host closed the connection)
2025-11-13 12:31:52 +0100 <lucabtz> and it still would be better space wise
2025-11-13 12:30:35 +0100xff0x(~xff0x@2405:6580:b080:900:7cd4:5734:7947:6d90) (Quit: xff0x)
2025-11-13 12:30:01 +0100Lycurgus(~juan@user/Lycurgus) Lycurgus
2025-11-13 12:29:04 +0100 <lucabtz> the worst case complexity would still be O(n), maybe the worst case would incredibly unlickely given the key space is so big though
2025-11-13 12:27:46 +0100 <lucabtz> but the tree would need to contain bins are leaves no?
2025-11-13 12:25:13 +0100trickard__trickard
2025-11-13 12:25:13 +0100trickard(~trickard@cpe-62-98-47-163.wireline.com.au) (Ping timeout: 264 seconds)
2025-11-13 12:24:39 +0100 <[exa]> but don't :D
2025-11-13 12:24:19 +0100trickard__(~trickard@cpe-62-98-47-163.wireline.com.au)
2025-11-13 12:24:18 +0100 <[exa]> technically you don't need short hashes because you don't need to minimize the hash table, so you can always have say 64b hash, so collisions won't hurt too much, and at worst you simply find the one key that you wanted from the tree range that the search returns
2025-11-13 12:24:08 +0100xff0x(~xff0x@2405:6580:b080:900:7cd4:5734:7947:6d90)
2025-11-13 12:24:06 +0100merijn(~merijn@77.242.116.146) (Ping timeout: 256 seconds)
2025-11-13 12:22:54 +0100 <[exa]> multimap™
2025-11-13 12:21:20 +0100 <lucabtz> [exa] yeah i didnt think of that, though it wouldnt work, what about hash collisions?
2025-11-13 12:19:52 +0100 <[exa]> lucabtz: like at wurst you can commit a heinous crime and order by the hash. Not sure if the other ways is doable universally tho.
2025-11-13 12:17:48 +0100 <merijn> (important side note that containers indeed guarantees that foldable/traversable operation happen in ascending key order)
2025-11-13 12:17:47 +0100 <lucabtz> so it isnt very different
2025-11-13 12:17:42 +0100 <lucabtz> but you need a hash in the hash map
2025-11-13 12:17:32 +0100 <lucabtz> though you need ordering for a tree map which you dont for a hash map
2025-11-13 12:17:15 +0100 <[exa]> <3
2025-11-13 12:17:11 +0100 <merijn> <3
2025-11-13 12:17:09 +0100 <merijn> Guaranteed stable ordering for traversals/iteration too
2025-11-13 12:17:04 +0100 <lucabtz> merijn yeah yeah your point is valid
2025-11-13 12:16:48 +0100 <merijn> Additionally the ability of ordered maps to look up "smallest key bigger than X" or "smallest key larger than X" is super useful in many situations.
2025-11-13 12:16:12 +0100 <merijn> 20-30% more space than you have data
2025-11-13 12:16:06 +0100 <merijn> lucabtz: A decent tree implementation like red-black or AVL tree guarantee O(log n) worst case lookup and insert (vs O(n) worst case for hashmap). Now hashmap can have better average case, but that depends on the ratio of keys to buckets. You need a certain percentage of empty buckets to avoid collissions, I don't know the optimal numbers but I'm betting at least 20-30%. So that means you must allocate
2025-11-13 12:15:50 +0100 <[exa]> kuribas`: anyway the most formal argument against the hashmaps that I have is literally that cache bump that you saw there in the benchmarks. With hashmaps it's unavoidable if your map grows; with sensible maps you can make it disappear using some kind of locality on a map of any size.
2025-11-13 12:13:11 +0100 <[exa]> doc: no that's embedding_vector_map, that was popular 3 years ago. Behold: `insertAiMap k v = unsafePerformGemini $ "Please remember that " ++ show k ++ " saves " ++ show v`
2025-11-13 12:12:54 +0100 <merijn> doc: I mean, replace the hashmap with a metric tree and that's actually neat
2025-11-13 12:12:46 +0100 <lucabtz> merijn im a C++ developer as day job and always hated that std::map is not a hashmap but you are making me reconsider
2025-11-13 12:12:24 +0100 <haskellbridge> <doc> i am sure that has gone through a couple hype cycles already
2025-11-13 12:11:41 +0100 <haskellbridge> <doc> ai_map.. now that's just a hashmap where everything is hashed to embedding vectors :^)
2025-11-13 12:11:27 +0100 <[exa]> merijn: I told them they failed me, don't worry. :D
2025-11-13 12:10:55 +0100 <[exa]> anyway yeah the unordered wording is great there, tells nicely which property is lost
2025-11-13 12:10:46 +0100 <merijn> [exa]: Fail them!
2025-11-13 12:10:36 +0100 <[exa]> merijn: I saw stuff like `using map = std::unordered_map;` from students, wasn't happy
2025-11-13 12:10:29 +0100Taneb(~username@host-95-251-57-201.retail.telecomitalia.it) (Ping timeout: 256 seconds)
2025-11-13 12:10:19 +0100 <merijn> The only reason I'm not a professional SQLite shill is that I haven't figured out how to get paid :p
2025-11-13 12:09:43 +0100 <[exa]> ...at which point the cool thing is gonna be std::ai_map or so