2026/02/14

Newest at the top

2026-02-14 22:36:17 +0100 <tomsmeding> ah yes, in this case it is
2026-02-14 22:36:01 +0100 <int-e> tomsmeding: well, the alignment is already fixed by how you unrolled the loop
2026-02-14 22:36:00 +0100 <[exa]> like, ofc llvm is going to brainify that to The Way Better Aligned Load
2026-02-14 22:35:40 +0100 <tomsmeding> I'm... not sure what I was thinking
2026-02-14 22:35:33 +0100 <tomsmeding> [exa]: good point, my bad, yes loadu is a thing
2026-02-14 22:35:07 +0100 <tomsmeding> so it should just generate a prologue
2026-02-14 22:34:58 +0100 <tomsmeding> but yes, I guess that llvm will have to deal with unaligned arrays anyway because it cannot assume any arbitrary pointer is aligned
2026-02-14 22:34:47 +0100 <[exa]> loadu doesn't exist for epi8?
2026-02-14 22:34:23 +0100 <tomsmeding> [exa]: simd instructions do have alignment constraints where normal x86 memory ops don't
2026-02-14 22:33:48 +0100 <[exa]> probie: the alignment isn't a totally huge deal (just adds a bit of a weight on whether llvm later chooses the unaligned load or doesn't even trigger)
2026-02-14 22:31:59 +0100merijn(~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds)
2026-02-14 22:31:44 +0100machinedgod(~machinedg@d75-159-126-101.abhsia.telus.net) machinedgod
2026-02-14 22:31:32 +0100 <probie> Vector still has alignment issues though, so I'm not inclined to bend over backwards to get it to work
2026-02-14 22:26:51 +0100merijn(~merijn@host-cl.cgnat-g.v4.dfn.nl) merijn
2026-02-14 22:26:29 +0100 <tomsmeding> storable vectors are just C-style arrays as you expect
2026-02-14 22:26:10 +0100 <tomsmeding> so there is indirection between the vector you have in hand and the underlying storage
2026-02-14 22:25:59 +0100 <[exa]> probie: also highly suggest having a look at if repa/massiv can do this and if it can, copy what they did. IIRC these would still count these as "pure" haskell.
2026-02-14 22:25:55 +0100 <tomsmeding> the idea of unboxed vectors is that they are struct-of-arrays transformed, i.e. a vector of (Int, Int) is actually two vectors under the hood
2026-02-14 22:25:32 +0100 <tomsmeding> recommend storable vectors for that though, as they have a sensible withForeignPtr function
2026-02-14 22:25:04 +0100 <[exa]> probie: there should be some (ugly but working) way to get a pointer to the vector which can be used as the required target type for the primitive op
2026-02-14 22:24:30 +0100infinity0(~infinity0@pwned.gg) infinity0
2026-02-14 22:23:55 +0100 <probie> I think if I want to do this in "pure" Haskell, I'm better off ditching vector, and using `MutableByteArray#`s with the explicit SIMD operations from ghc-prim
2026-02-14 22:21:54 +0100 <[exa]> :(
2026-02-14 22:21:42 +0100 <probie> [exa]: It did fail, and the reads are absolutely not aligned
2026-02-14 22:21:18 +0100tcard(~tcard@2400:4051:5801:7500:cf17:befc:ff82:5303) (Quit: Leaving)
2026-02-14 22:20:25 +0100 <[exa]> probie: btw if this fails, try making sure the reads are aligned (no real clue how to help there tho.)
2026-02-14 22:19:44 +0100 <tomsmeding> <3
2026-02-14 22:19:33 +0100 <[exa]> tomsmeding: cool I'm going to simd my stupid database string-indexing code :D
2026-02-14 22:19:28 +0100L29Ah(~L29Ah@wikipedia/L29Ah) L29Ah
2026-02-14 22:17:27 +0100 <geekosaur> (I typo each of those into the other constantly…)
2026-02-14 22:16:34 +0100 <tomsmeding> I can't type any more
2026-02-14 22:16:32 +0100 <tomsmeding> s/ghc/gcc/
2026-02-14 22:16:00 +0100 <tomsmeding> -ffast-math dances circles around IEEE semantics, but memory semantics aren't broken down as far as I know
2026-02-14 22:15:42 +0100 <tomsmeding> I would be surprised if ghc does this at any optimisation level
2026-02-14 22:15:23 +0100 <tomsmeding> ignoring this is a blatant violation of the semantics
2026-02-14 22:15:13 +0100 <tomsmeding> and I can also assure you that GHC will not tell LLVM that these things do not alias
2026-02-14 22:15:11 +0100 <probie> Even gcc only ignores it if you pass -O3 IIRC
2026-02-14 22:14:52 +0100 <tomsmeding> this is DEFINITELY not ignored by llvm
2026-02-14 22:14:40 +0100 <tomsmeding> the _mm256 and _mm variants indeed have 3
2026-02-14 22:14:38 +0100 <[exa]> probie: anyway it might be the case that the compiler just ignores it but I'd bet this is the problem number 1
2026-02-14 22:14:33 +0100 <tomsmeding> [exa]: oh I mistyped, I meant _mm512_add_epi8
2026-02-14 22:14:29 +0100merijn(~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds)
2026-02-14 22:14:20 +0100 <[exa]> probie: memory order too strong QQ
2026-02-14 22:14:09 +0100 <tomsmeding> yes
2026-02-14 22:14:06 +0100 <probie> there can be aliasing
2026-02-14 22:14:04 +0100 <tomsmeding> probie: the compiler doesn't know that
2026-02-14 22:13:56 +0100 <probie> oh wait, <expletive>
2026-02-14 22:13:52 +0100 <[exa]> tomsmeding: where do you read that? intel intrinsics guide says 3 per cycle
2026-02-14 22:13:46 +0100 <probie> There isn't really a data dependency though, since memory is never read again after being written
2026-02-14 22:13:28 +0100 <tomsmeding> (yes, the throughput label is misleading; I checked that a div_pd has 4 there and add_pd 0.5, so indeed it's CPI = 1/throughput)