Newest at the top
| 2026-02-14 22:36:17 +0100 | <tomsmeding> | ah yes, in this case it is |
| 2026-02-14 22:36:01 +0100 | <int-e> | tomsmeding: well, the alignment is already fixed by how you unrolled the loop |
| 2026-02-14 22:36:00 +0100 | <[exa]> | like, ofc llvm is going to brainify that to The Way Better Aligned Load |
| 2026-02-14 22:35:40 +0100 | <tomsmeding> | I'm... not sure what I was thinking |
| 2026-02-14 22:35:33 +0100 | <tomsmeding> | [exa]: good point, my bad, yes loadu is a thing |
| 2026-02-14 22:35:07 +0100 | <tomsmeding> | so it should just generate a prologue |
| 2026-02-14 22:34:58 +0100 | <tomsmeding> | but yes, I guess that llvm will have to deal with unaligned arrays anyway because it cannot assume any arbitrary pointer is aligned |
| 2026-02-14 22:34:47 +0100 | <[exa]> | loadu doesn't exist for epi8? |
| 2026-02-14 22:34:23 +0100 | <tomsmeding> | [exa]: simd instructions do have alignment constraints where normal x86 memory ops don't |
| 2026-02-14 22:33:48 +0100 | <[exa]> | probie: the alignment isn't a totally huge deal (just adds a bit of a weight on whether llvm later chooses the unaligned load or doesn't even trigger) |
| 2026-02-14 22:31:59 +0100 | merijn | (~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds) |
| 2026-02-14 22:31:44 +0100 | machinedgod | (~machinedg@d75-159-126-101.abhsia.telus.net) machinedgod |
| 2026-02-14 22:31:32 +0100 | <probie> | Vector still has alignment issues though, so I'm not inclined to bend over backwards to get it to work |
| 2026-02-14 22:26:51 +0100 | merijn | (~merijn@host-cl.cgnat-g.v4.dfn.nl) merijn |
| 2026-02-14 22:26:29 +0100 | <tomsmeding> | storable vectors are just C-style arrays as you expect |
| 2026-02-14 22:26:10 +0100 | <tomsmeding> | so there is indirection between the vector you have in hand and the underlying storage |
| 2026-02-14 22:25:59 +0100 | <[exa]> | probie: also highly suggest having a look at if repa/massiv can do this and if it can, copy what they did. IIRC these would still count these as "pure" haskell. |
| 2026-02-14 22:25:55 +0100 | <tomsmeding> | the idea of unboxed vectors is that they are struct-of-arrays transformed, i.e. a vector of (Int, Int) is actually two vectors under the hood |
| 2026-02-14 22:25:32 +0100 | <tomsmeding> | recommend storable vectors for that though, as they have a sensible withForeignPtr function |
| 2026-02-14 22:25:04 +0100 | <[exa]> | probie: there should be some (ugly but working) way to get a pointer to the vector which can be used as the required target type for the primitive op |
| 2026-02-14 22:24:30 +0100 | infinity0 | (~infinity0@pwned.gg) infinity0 |
| 2026-02-14 22:23:55 +0100 | <probie> | I think if I want to do this in "pure" Haskell, I'm better off ditching vector, and using `MutableByteArray#`s with the explicit SIMD operations from ghc-prim |
| 2026-02-14 22:21:54 +0100 | <[exa]> | :( |
| 2026-02-14 22:21:42 +0100 | <probie> | [exa]: It did fail, and the reads are absolutely not aligned |
| 2026-02-14 22:21:18 +0100 | tcard | (~tcard@2400:4051:5801:7500:cf17:befc:ff82:5303) (Quit: Leaving) |
| 2026-02-14 22:20:25 +0100 | <[exa]> | probie: btw if this fails, try making sure the reads are aligned (no real clue how to help there tho.) |
| 2026-02-14 22:19:44 +0100 | <tomsmeding> | <3 |
| 2026-02-14 22:19:33 +0100 | <[exa]> | tomsmeding: cool I'm going to simd my stupid database string-indexing code :D |
| 2026-02-14 22:19:28 +0100 | L29Ah | (~L29Ah@wikipedia/L29Ah) L29Ah |
| 2026-02-14 22:17:27 +0100 | <geekosaur> | (I typo each of those into the other constantly…) |
| 2026-02-14 22:16:34 +0100 | <tomsmeding> | I can't type any more |
| 2026-02-14 22:16:32 +0100 | <tomsmeding> | s/ghc/gcc/ |
| 2026-02-14 22:16:00 +0100 | <tomsmeding> | -ffast-math dances circles around IEEE semantics, but memory semantics aren't broken down as far as I know |
| 2026-02-14 22:15:42 +0100 | <tomsmeding> | I would be surprised if ghc does this at any optimisation level |
| 2026-02-14 22:15:23 +0100 | <tomsmeding> | ignoring this is a blatant violation of the semantics |
| 2026-02-14 22:15:13 +0100 | <tomsmeding> | and I can also assure you that GHC will not tell LLVM that these things do not alias |
| 2026-02-14 22:15:11 +0100 | <probie> | Even gcc only ignores it if you pass -O3 IIRC |
| 2026-02-14 22:14:52 +0100 | <tomsmeding> | this is DEFINITELY not ignored by llvm |
| 2026-02-14 22:14:40 +0100 | <tomsmeding> | the _mm256 and _mm variants indeed have 3 |
| 2026-02-14 22:14:38 +0100 | <[exa]> | probie: anyway it might be the case that the compiler just ignores it but I'd bet this is the problem number 1 |
| 2026-02-14 22:14:33 +0100 | <tomsmeding> | [exa]: oh I mistyped, I meant _mm512_add_epi8 |
| 2026-02-14 22:14:29 +0100 | merijn | (~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds) |
| 2026-02-14 22:14:20 +0100 | <[exa]> | probie: memory order too strong QQ |
| 2026-02-14 22:14:09 +0100 | <tomsmeding> | yes |
| 2026-02-14 22:14:06 +0100 | <probie> | there can be aliasing |
| 2026-02-14 22:14:04 +0100 | <tomsmeding> | probie: the compiler doesn't know that |
| 2026-02-14 22:13:56 +0100 | <probie> | oh wait, <expletive> |
| 2026-02-14 22:13:52 +0100 | <[exa]> | tomsmeding: where do you read that? intel intrinsics guide says 3 per cycle |
| 2026-02-14 22:13:46 +0100 | <probie> | There isn't really a data dependency though, since memory is never read again after being written |
| 2026-02-14 22:13:28 +0100 | <tomsmeding> | (yes, the throughput label is misleading; I checked that a div_pd has 4 there and add_pd 0.5, so indeed it's CPI = 1/throughput) |