2025/02/25

Newest at the top

2025-02-25 10:32:59 +0100 <Athas> tomsmeding: very promising!
2025-02-25 10:32:38 +0100 <tomsmeding> Athas: I used fneural_A for my library and fneural for 'ad' https://git.tomsmeding.com/ad-dual/tree/examples/Numeric/ADDual/Examples.hs
2025-02-25 10:32:22 +0100ski<https://downloads.haskell.org/ghc/8.4-latest/docs/html/users_guide/parallel.html#data-parallel-has…>,<https://wiki.haskell.org/GHC/Data_Parallel_Haskell,<https://www.cs.cmu.edu/~scandal/cacm/cacm2.html>,<https://web.archive.org/web/20040806031752/http://cgi.cse.unsw.edu.au/~chak/papers/papers.html#ndp…>,<https://en.wikipedia.org/wiki/NESL>,<https://en.wikipedia.org/wiki/Data_parallelism> )
2025-02-25 10:32:22 +0100ski. o O ( "Data Parallel Haskell" (GHC 6.10 - 8.4)
2025-02-25 10:32:04 +0100 <tomsmeding> I benchmarked this on a simple 2-hidden-layer neural network with relu activation and softmax at the end; when input and hidden layers are all width N, then when N goes from 100 to 2000, the speedup over 'ad' goes from 26x to ~550x
2025-02-25 10:31:54 +0100 <Athas> How close is this to working?
2025-02-25 10:29:37 +0100 <tomsmeding> 'ad' also has unsafePerformIO in the same positions
2025-02-25 10:29:07 +0100ThePenguin(~ThePengui@cust-95-80-24-166.csbnet.se) ThePenguin
2025-02-25 10:28:51 +0100 <tomsmeding> I guess that's called "runST . unsafeCoerce"
2025-02-25 10:28:41 +0100 <tomsmeding> and there isn't even unsafePerformST!
2025-02-25 10:28:23 +0100 <tomsmeding> Athas: I might be able to switch to ST, but because the individual scalar operations are effectful (they mutate the tape), there's going to be some unsafePerformIO-like thing there anyway
2025-02-25 10:27:08 +0100ThePenguin(~ThePengui@cust-95-80-24-166.csbnet.se) (Remote host closed the connection)
2025-02-25 10:26:40 +0100why(~why@n218250229238.netvigator.com) (Quit: Client closed)
2025-02-25 10:26:40 +0100 <tomsmeding> if so, then exactly the same thing should work here
2025-02-25 10:26:24 +0100 <Athas> Yes, so that ought to work. That is good.
2025-02-25 10:25:48 +0100 <tomsmeding> Athas: I think if you 'diffF' on a function that uses 'grad', you instantiate the 'a' in the type of 'grad' with 'AD s (Forward a)'; does that sound right?
2025-02-25 10:24:47 +0100 <tomsmeding> nothing, it seems, semantically?
2025-02-25 10:24:29 +0100 <tomsmeding> *supposed
2025-02-25 10:24:27 +0100 <tomsmeding> I'm not familiar enough with the 'ad' API to figure out what that 'AD' type is suppose to do
2025-02-25 10:22:59 +0100dsrt^(~dsrt@108.192.66.114) (Ping timeout: 260 seconds)
2025-02-25 10:22:50 +0100 <tomsmeding> the fact that that package's name is so short is really inconvenient. :P
2025-02-25 10:22:36 +0100tomsmedingopens the 'ad' docs
2025-02-25 10:22:30 +0100off^(~off@108.192.66.114) (Ping timeout: 268 seconds)
2025-02-25 10:22:19 +0100 <Athas> It's 'diffF' on a function that uses 'grad'.
2025-02-25 10:21:55 +0100 <Athas> Operationally yes. I don't know what the type of that looks like. I was referring to the surface syntax.
2025-02-25 10:21:13 +0100 <tomsmeding> i.e. doing reverse mode, but secretly the scalars are dual numbers?
2025-02-25 10:21:08 +0100califax(~califax@user/califx) califx
2025-02-25 10:21:00 +0100 <tomsmeding> wouldn't that be Reverse s (Forward Double) in 'ad'?
2025-02-25 10:20:49 +0100califax(~califax@user/califx) (Remote host closed the connection)
2025-02-25 10:20:40 +0100 <Athas> It Just Works in 'ad'.
2025-02-25 10:20:27 +0100 <Athas> jvp (\ ... vjp ...), but with the weird 'ad' names instead.
2025-02-25 10:20:17 +0100 <tomsmeding> and I think you may be right that IO is overkill here!
2025-02-25 10:19:57 +0100Smiles(uid551636@id-551636.lymington.irccloud.com) Smiles
2025-02-25 10:19:49 +0100 <tomsmeding> how would that look in 'ad'?
2025-02-25 10:19:31 +0100 <Athas> But this cannot possibly do reverse-then-forward, right?
2025-02-25 10:19:14 +0100 <tomsmeding> surely they could be
2025-02-25 10:19:09 +0100 <Athas> I guess they could be.
2025-02-25 10:19:01 +0100 <Athas> Are dual numbers Storable?
2025-02-25 10:18:52 +0100 <Athas> This looks compelling, but why unsafePerformIO directly? Can this not be expressed with ST?
2025-02-25 10:18:43 +0100AlexZenon(~alzenon@178.34.162.44)
2025-02-25 10:18:18 +0100 <tomsmeding> it doesn't even assume that 'a' is Double, so you should be able to instantiate 'a' with a standard forward-mode dual number
2025-02-25 10:17:49 +0100 <Athas> Yes, I think it's fine for all of my uses, but honestly just 'VS.Vector a' would work as well.
2025-02-25 10:17:38 +0100 <tomsmeding> Or a boxed vector of vectors!
2025-02-25 10:17:25 +0100 <tomsmeding> it just means that you can pass in a single vector if you want, but also a list of vectors.
2025-02-25 10:17:02 +0100 <Athas> This looks compelling. Using 'f (VS.Vector a)' instead of just 'a' is make it easier to have some efficient representation without imposing constraints on 'f'?
2025-02-25 10:15:48 +0100 <tomsmeding> also that
2025-02-25 10:15:40 +0100 <Athas> Yes, that can be fixed at a higher level.
2025-02-25 10:15:12 +0100 <tomsmeding> but that's rather non-fundamental
2025-02-25 10:15:04 +0100 <tomsmeding> arrays are single-dimensional only
2025-02-25 10:14:57 +0100 <tomsmeding> full disclosure: I've been hacking on something like this; the API currently looks like this: gradient' from here ( https://git.tomsmeding.com/ad-dual/tree/src/Numeric/ADDual/Array/Internal.hs ), where 'VDual s Double' implements the classes from here ( https://git.tomsmeding.com/ad-dual/tree/src/Numeric/ADDual/VectorOps.hs )