Newest at the top
| 2026-02-14 22:31:59 +0100 | merijn | (~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds) |
| 2026-02-14 22:31:44 +0100 | machinedgod | (~machinedg@d75-159-126-101.abhsia.telus.net) machinedgod |
| 2026-02-14 22:31:32 +0100 | <probie> | Vector still has alignment issues though, so I'm not inclined to bend over backwards to get it to work |
| 2026-02-14 22:26:51 +0100 | merijn | (~merijn@host-cl.cgnat-g.v4.dfn.nl) merijn |
| 2026-02-14 22:26:29 +0100 | <tomsmeding> | storable vectors are just C-style arrays as you expect |
| 2026-02-14 22:26:10 +0100 | <tomsmeding> | so there is indirection between the vector you have in hand and the underlying storage |
| 2026-02-14 22:25:59 +0100 | <[exa]> | probie: also highly suggest having a look at if repa/massiv can do this and if it can, copy what they did. IIRC these would still count these as "pure" haskell. |
| 2026-02-14 22:25:55 +0100 | <tomsmeding> | the idea of unboxed vectors is that they are struct-of-arrays transformed, i.e. a vector of (Int, Int) is actually two vectors under the hood |
| 2026-02-14 22:25:32 +0100 | <tomsmeding> | recommend storable vectors for that though, as they have a sensible withForeignPtr function |
| 2026-02-14 22:25:04 +0100 | <[exa]> | probie: there should be some (ugly but working) way to get a pointer to the vector which can be used as the required target type for the primitive op |
| 2026-02-14 22:24:30 +0100 | infinity0 | (~infinity0@pwned.gg) infinity0 |
| 2026-02-14 22:23:55 +0100 | <probie> | I think if I want to do this in "pure" Haskell, I'm better off ditching vector, and using `MutableByteArray#`s with the explicit SIMD operations from ghc-prim |
| 2026-02-14 22:21:54 +0100 | <[exa]> | :( |
| 2026-02-14 22:21:42 +0100 | <probie> | [exa]: It did fail, and the reads are absolutely not aligned |
| 2026-02-14 22:21:18 +0100 | tcard | (~tcard@2400:4051:5801:7500:cf17:befc:ff82:5303) (Quit: Leaving) |
| 2026-02-14 22:20:25 +0100 | <[exa]> | probie: btw if this fails, try making sure the reads are aligned (no real clue how to help there tho.) |
| 2026-02-14 22:19:44 +0100 | <tomsmeding> | <3 |
| 2026-02-14 22:19:33 +0100 | <[exa]> | tomsmeding: cool I'm going to simd my stupid database string-indexing code :D |
| 2026-02-14 22:19:28 +0100 | L29Ah | (~L29Ah@wikipedia/L29Ah) L29Ah |
| 2026-02-14 22:17:27 +0100 | <geekosaur> | (I typo each of those into the other constantly…) |
| 2026-02-14 22:16:34 +0100 | <tomsmeding> | I can't type any more |
| 2026-02-14 22:16:32 +0100 | <tomsmeding> | s/ghc/gcc/ |
| 2026-02-14 22:16:00 +0100 | <tomsmeding> | -ffast-math dances circles around IEEE semantics, but memory semantics aren't broken down as far as I know |
| 2026-02-14 22:15:42 +0100 | <tomsmeding> | I would be surprised if ghc does this at any optimisation level |
| 2026-02-14 22:15:23 +0100 | <tomsmeding> | ignoring this is a blatant violation of the semantics |
| 2026-02-14 22:15:13 +0100 | <tomsmeding> | and I can also assure you that GHC will not tell LLVM that these things do not alias |
| 2026-02-14 22:15:11 +0100 | <probie> | Even gcc only ignores it if you pass -O3 IIRC |
| 2026-02-14 22:14:52 +0100 | <tomsmeding> | this is DEFINITELY not ignored by llvm |
| 2026-02-14 22:14:40 +0100 | <tomsmeding> | the _mm256 and _mm variants indeed have 3 |
| 2026-02-14 22:14:38 +0100 | <[exa]> | probie: anyway it might be the case that the compiler just ignores it but I'd bet this is the problem number 1 |
| 2026-02-14 22:14:33 +0100 | <tomsmeding> | [exa]: oh I mistyped, I meant _mm512_add_epi8 |
| 2026-02-14 22:14:29 +0100 | merijn | (~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds) |
| 2026-02-14 22:14:20 +0100 | <[exa]> | probie: memory order too strong QQ |
| 2026-02-14 22:14:09 +0100 | <tomsmeding> | yes |
| 2026-02-14 22:14:06 +0100 | <probie> | there can be aliasing |
| 2026-02-14 22:14:04 +0100 | <tomsmeding> | probie: the compiler doesn't know that |
| 2026-02-14 22:13:56 +0100 | <probie> | oh wait, <expletive> |
| 2026-02-14 22:13:52 +0100 | <[exa]> | tomsmeding: where do you read that? intel intrinsics guide says 3 per cycle |
| 2026-02-14 22:13:46 +0100 | <probie> | There isn't really a data dependency though, since memory is never read again after being written |
| 2026-02-14 22:13:28 +0100 | <tomsmeding> | (yes, the throughput label is misleading; I checked that a div_pd has 4 there and add_pd 0.5, so indeed it's CPI = 1/throughput) |
| 2026-02-14 22:12:51 +0100 | <tomsmeding> | and apparently it can even do two of those _mm256_add_epi8 instructions in one cycle, by the CPI of 0.5 |
| 2026-02-14 22:12:20 +0100 | peterbecich | (~Thunderbi@71.84.33.135) (Ping timeout: 256 seconds) |
| 2026-02-14 22:12:10 +0100 | <tomsmeding> | epi is integer stuff |
| 2026-02-14 22:11:56 +0100 | <[exa]> | oh these are the epi8 instructions from the intrinsic guide that I ignored everytime |
| 2026-02-14 22:11:39 +0100 | <tomsmeding> | think about that, 64 adds with 1-cycle latency |
| 2026-02-14 22:11:05 +0100 | <tomsmeding> | or the _mm256 version, or _mm512 if you want to use your juicy AVX512 |
| 2026-02-14 22:10:29 +0100 | <tomsmeding> | _mm_add_epi8 is the one you want here (paddb) |
| 2026-02-14 22:10:17 +0100 | <tomsmeding> | https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=epi8 |
| 2026-02-14 22:10:02 +0100 | [exa] | learned today |
| 2026-02-14 22:09:48 +0100 | <tomsmeding> | yes |