2026/02/14

Newest at the top

2026-02-14 22:31:59 +0100merijn(~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds)
2026-02-14 22:31:44 +0100machinedgod(~machinedg@d75-159-126-101.abhsia.telus.net) machinedgod
2026-02-14 22:31:32 +0100 <probie> Vector still has alignment issues though, so I'm not inclined to bend over backwards to get it to work
2026-02-14 22:26:51 +0100merijn(~merijn@host-cl.cgnat-g.v4.dfn.nl) merijn
2026-02-14 22:26:29 +0100 <tomsmeding> storable vectors are just C-style arrays as you expect
2026-02-14 22:26:10 +0100 <tomsmeding> so there is indirection between the vector you have in hand and the underlying storage
2026-02-14 22:25:59 +0100 <[exa]> probie: also highly suggest having a look at if repa/massiv can do this and if it can, copy what they did. IIRC these would still count these as "pure" haskell.
2026-02-14 22:25:55 +0100 <tomsmeding> the idea of unboxed vectors is that they are struct-of-arrays transformed, i.e. a vector of (Int, Int) is actually two vectors under the hood
2026-02-14 22:25:32 +0100 <tomsmeding> recommend storable vectors for that though, as they have a sensible withForeignPtr function
2026-02-14 22:25:04 +0100 <[exa]> probie: there should be some (ugly but working) way to get a pointer to the vector which can be used as the required target type for the primitive op
2026-02-14 22:24:30 +0100infinity0(~infinity0@pwned.gg) infinity0
2026-02-14 22:23:55 +0100 <probie> I think if I want to do this in "pure" Haskell, I'm better off ditching vector, and using `MutableByteArray#`s with the explicit SIMD operations from ghc-prim
2026-02-14 22:21:54 +0100 <[exa]> :(
2026-02-14 22:21:42 +0100 <probie> [exa]: It did fail, and the reads are absolutely not aligned
2026-02-14 22:21:18 +0100tcard(~tcard@2400:4051:5801:7500:cf17:befc:ff82:5303) (Quit: Leaving)
2026-02-14 22:20:25 +0100 <[exa]> probie: btw if this fails, try making sure the reads are aligned (no real clue how to help there tho.)
2026-02-14 22:19:44 +0100 <tomsmeding> <3
2026-02-14 22:19:33 +0100 <[exa]> tomsmeding: cool I'm going to simd my stupid database string-indexing code :D
2026-02-14 22:19:28 +0100L29Ah(~L29Ah@wikipedia/L29Ah) L29Ah
2026-02-14 22:17:27 +0100 <geekosaur> (I typo each of those into the other constantly…)
2026-02-14 22:16:34 +0100 <tomsmeding> I can't type any more
2026-02-14 22:16:32 +0100 <tomsmeding> s/ghc/gcc/
2026-02-14 22:16:00 +0100 <tomsmeding> -ffast-math dances circles around IEEE semantics, but memory semantics aren't broken down as far as I know
2026-02-14 22:15:42 +0100 <tomsmeding> I would be surprised if ghc does this at any optimisation level
2026-02-14 22:15:23 +0100 <tomsmeding> ignoring this is a blatant violation of the semantics
2026-02-14 22:15:13 +0100 <tomsmeding> and I can also assure you that GHC will not tell LLVM that these things do not alias
2026-02-14 22:15:11 +0100 <probie> Even gcc only ignores it if you pass -O3 IIRC
2026-02-14 22:14:52 +0100 <tomsmeding> this is DEFINITELY not ignored by llvm
2026-02-14 22:14:40 +0100 <tomsmeding> the _mm256 and _mm variants indeed have 3
2026-02-14 22:14:38 +0100 <[exa]> probie: anyway it might be the case that the compiler just ignores it but I'd bet this is the problem number 1
2026-02-14 22:14:33 +0100 <tomsmeding> [exa]: oh I mistyped, I meant _mm512_add_epi8
2026-02-14 22:14:29 +0100merijn(~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds)
2026-02-14 22:14:20 +0100 <[exa]> probie: memory order too strong QQ
2026-02-14 22:14:09 +0100 <tomsmeding> yes
2026-02-14 22:14:06 +0100 <probie> there can be aliasing
2026-02-14 22:14:04 +0100 <tomsmeding> probie: the compiler doesn't know that
2026-02-14 22:13:56 +0100 <probie> oh wait, <expletive>
2026-02-14 22:13:52 +0100 <[exa]> tomsmeding: where do you read that? intel intrinsics guide says 3 per cycle
2026-02-14 22:13:46 +0100 <probie> There isn't really a data dependency though, since memory is never read again after being written
2026-02-14 22:13:28 +0100 <tomsmeding> (yes, the throughput label is misleading; I checked that a div_pd has 4 there and add_pd 0.5, so indeed it's CPI = 1/throughput)
2026-02-14 22:12:51 +0100 <tomsmeding> and apparently it can even do two of those _mm256_add_epi8 instructions in one cycle, by the CPI of 0.5
2026-02-14 22:12:20 +0100peterbecich(~Thunderbi@71.84.33.135) (Ping timeout: 256 seconds)
2026-02-14 22:12:10 +0100 <tomsmeding> epi is integer stuff
2026-02-14 22:11:56 +0100 <[exa]> oh these are the epi8 instructions from the intrinsic guide that I ignored everytime
2026-02-14 22:11:39 +0100 <tomsmeding> think about that, 64 adds with 1-cycle latency
2026-02-14 22:11:05 +0100 <tomsmeding> or the _mm256 version, or _mm512 if you want to use your juicy AVX512
2026-02-14 22:10:29 +0100 <tomsmeding> _mm_add_epi8 is the one you want here (paddb)
2026-02-14 22:10:17 +0100 <tomsmeding> https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=epi8
2026-02-14 22:10:02 +0100[exa]learned today
2026-02-14 22:09:48 +0100 <tomsmeding> yes