Newest at the top
2024-04-29 23:05:17 +0200 | <tomsmeding> | there's your problem for sure |
2024-04-29 23:05:07 +0200 | <tomsmeding> | bajeezus nofib-interpreter has just finished building and started executing with --enable-coverage and this is at least 1e4 times slower |
2024-04-29 23:04:49 +0200 | <shapr> | because it shows you how few paths are executed? |
2024-04-29 23:04:39 +0200 | <shapr> | how so? |
2024-04-29 23:04:33 +0200 | <tomsmeding> | but path-sensitive coverage tracking, while more accurate, is very depressing to test with |
2024-04-29 23:04:14 +0200 | <tomsmeding> | sure, path-insensitive coverage tracking is overly optimistic, that's known |
2024-04-29 23:03:59 +0200 | <shapr> | I've been thinking about how to identify code paths, but I dunno yet |
2024-04-29 23:03:58 +0200 | <tomsmeding> | how path-sensitive is your coverage |
2024-04-29 23:03:51 +0200 | <tomsmeding> | that's a complaint about the coverage metric |
2024-04-29 23:03:35 +0200 | <tomsmeding> | that does something about my unsoundness complaint but doesn't solve it |
2024-04-29 23:03:34 +0200 | <shapr> | tomsmeding: Hughes' point is that many functions are reached through multiple code paths |
2024-04-29 23:03:19 +0200 | <shapr> | tomsmeding: yes! that's what I want to do next! |
2024-04-29 23:03:18 +0200 | <tomsmeding> | meh |
2024-04-29 23:03:08 +0200 | <shapr> | I gave a lightning talk on Kudzu at ICFP 2022 and John Hughes brought up the point that he would prefer to see coverage defined not as "executed once" but instead "executed at least N times" |
2024-04-29 23:02:46 +0200 | <tomsmeding> | but what I would be most interested in is the coverage report after testing |
2024-04-29 23:02:30 +0200 | <tomsmeding> | hm right, test duration is at most linear in your code size |
2024-04-29 23:02:12 +0200 | <shapr> | I don't think it's possible |
2024-04-29 23:01:59 +0200 | shapr | thinks |
2024-04-29 23:01:52 +0200 | <shapr> | There isn't a hard maximum, but I can't imagine a case where that would be a problem |
2024-04-29 23:01:32 +0200 | <tomsmeding> | with some hard maximum, presumably |
2024-04-29 23:01:29 +0200 | <mwnaylor> | Building ghc w/ slapt-src. Aliased python to python3. Piping stdout and stderr to separate files to save the results. Will share results when complete. |
2024-04-29 23:01:16 +0200 | tomsmeding | hasn't read your post yet |
2024-04-29 23:01:16 +0200 | <shapr> | Kudzu's stopping criterion is "hpc did not observe new coverage for N iterations of this PBT" |
2024-04-29 23:01:10 +0200 | <tomsmeding> | that would be the selling point for me |
2024-04-29 23:01:02 +0200 | <tomsmeding> | it does tell you something in the opposite case, though: if you've exhausted your test budget but you haven't achieved coverage, maybe you need to think about your test case distribution |
2024-04-29 23:00:11 +0200 | <tomsmeding> | true, though adding a more intelligent stopping criterion to PBT sounds like that stopping criterion should give stronger guarantees than "let's do 100 tests" |
2024-04-29 22:59:48 +0200 | <shapr> | PBT isn't proof by exhaustion or SMT solver |
2024-04-29 22:59:32 +0200 | <shapr> | I think that also holds true for property based testing |
2024-04-29 22:59:22 +0200 | <shapr> | oh, I agree with that |
2024-04-29 22:59:17 +0200 | <tomsmeding> | I'm just saying that it's a heuristic, and it's not completely sound |
2024-04-29 22:59:08 +0200 | <tomsmeding> | I'm not saying it's a bad idea |
2024-04-29 22:59:02 +0200 | <tomsmeding> | your heuristic might be very useful and applicable in practical cases |
2024-04-29 22:58:39 +0200 | shapr | thinks about that |
2024-04-29 22:58:10 +0200 | mei | (~mei@user/mei) |
2024-04-29 22:57:50 +0200 | cashew | (~cashewsta@65.17.175.150) (Ping timeout: 245 seconds) |
2024-04-29 22:57:14 +0200 | <tomsmeding> | shapr: https://math.stackexchange.com/a/111939/68044 |
2024-04-29 22:56:15 +0200 | <tomsmeding> | (grr, --enable-coverage rebuilds the world again) |
2024-04-29 22:55:46 +0200 | mei | (~mei@user/mei) (Remote host closed the connection) |
2024-04-29 22:55:30 +0200 | <tomsmeding> | ah right |
2024-04-29 22:55:17 +0200 | <int-e> | (it's *not* QuickCheck because QuickCheck tests embed IO) |
2024-04-29 22:55:08 +0200 | <lambdabot> | Just 45 |
2024-04-29 22:55:07 +0200 | <tomsmeding> | > find (\x -> round (2.000000001 ** fromIntegral (x :: Int) / 1e6) /= (round (2 ^ x / 1e6) :: Int)) [1..] |
2024-04-29 22:54:19 +0200 | <lambdabot> | Positive {getPositive = 58} |
2024-04-29 22:54:19 +0200 | <lambdabot> | *** Failed! Falsifiable (after 60 tests and 2 shrinks): |
2024-04-29 22:54:17 +0200 | <int-e> | @check \(Positive x) -> round (2.000000001 ** fromIntegral (x :: Int) / 1e6) == (round (2 ^ x / 1e6) :: Int) |
2024-04-29 22:53:43 +0200 | <tomsmeding> | but my point is that equalities can fail to hold without any conditionals, but seem to be fine for small inputs |
2024-04-29 22:53:26 +0200 | <tomsmeding> | clearly this is a constructed example |
2024-04-29 22:53:16 +0200 | <tomsmeding> | fails at x = 45 |
2024-04-29 22:53:08 +0200 | <tomsmeding> | blegh, runs in my ghci |
2024-04-29 22:53:00 +0200 | <lambdabot> | with actual type ‘Positive Int -> Bool’ |