2024/04/29

Newest at the top

2024-04-29 23:05:17 +0200	<tomsmeding>	there's your problem for sure
2024-04-29 23:05:07 +0200	<tomsmeding>	bajeezus nofib-interpreter has just finished building and started executing with --enable-coverage and this is at least 1e4 times slower
2024-04-29 23:04:49 +0200	<shapr>	because it shows you how few paths are executed?
2024-04-29 23:04:39 +0200	<shapr>	how so?
2024-04-29 23:04:33 +0200	<tomsmeding>	but path-sensitive coverage tracking, while more accurate, is very depressing to test with
2024-04-29 23:04:14 +0200	<tomsmeding>	sure, path-insensitive coverage tracking is overly optimistic, that's known
2024-04-29 23:03:59 +0200	<shapr>	I've been thinking about how to identify code paths, but I dunno yet
2024-04-29 23:03:58 +0200	<tomsmeding>	how path-sensitive is your coverage
2024-04-29 23:03:51 +0200	<tomsmeding>	that's a complaint about the coverage metric
2024-04-29 23:03:35 +0200	<tomsmeding>	that does something about my unsoundness complaint but doesn't solve it
2024-04-29 23:03:34 +0200	<shapr>	tomsmeding: Hughes' point is that many functions are reached through multiple code paths
2024-04-29 23:03:19 +0200	<shapr>	tomsmeding: yes! that's what I want to do next!
2024-04-29 23:03:18 +0200	<tomsmeding>	meh
2024-04-29 23:03:08 +0200	<shapr>	I gave a lightning talk on Kudzu at ICFP 2022 and John Hughes brought up the point that he would prefer to see coverage defined not as "executed once" but instead "executed at least N times"
2024-04-29 23:02:46 +0200	<tomsmeding>	but what I would be most interested in is the coverage report after testing
2024-04-29 23:02:30 +0200	<tomsmeding>	hm right, test duration is at most linear in your code size
2024-04-29 23:02:12 +0200	<shapr>	I don't think it's possible
2024-04-29 23:01:59 +0200	shapr	thinks
2024-04-29 23:01:52 +0200	<shapr>	There isn't a hard maximum, but I can't imagine a case where that would be a problem
2024-04-29 23:01:32 +0200	<tomsmeding>	with some hard maximum, presumably
2024-04-29 23:01:29 +0200	<mwnaylor>	Building ghc w/ slapt-src. Aliased python to python3. Piping stdout and stderr to separate files to save the results. Will share results when complete.
2024-04-29 23:01:16 +0200	tomsmeding	hasn't read your post yet
2024-04-29 23:01:16 +0200	<shapr>	Kudzu's stopping criterion is "hpc did not observe new coverage for N iterations of this PBT"
2024-04-29 23:01:10 +0200	<tomsmeding>	that would be the selling point for me
2024-04-29 23:01:02 +0200	<tomsmeding>	it does tell you something in the opposite case, though: if you've exhausted your test budget but you haven't achieved coverage, maybe you need to think about your test case distribution
2024-04-29 23:00:11 +0200	<tomsmeding>	true, though adding a more intelligent stopping criterion to PBT sounds like that stopping criterion should give stronger guarantees than "let's do 100 tests"
2024-04-29 22:59:48 +0200	<shapr>	PBT isn't proof by exhaustion or SMT solver
2024-04-29 22:59:32 +0200	<shapr>	I think that also holds true for property based testing
2024-04-29 22:59:22 +0200	<shapr>	oh, I agree with that
2024-04-29 22:59:17 +0200	<tomsmeding>	I'm just saying that it's a heuristic, and it's not completely sound
2024-04-29 22:59:08 +0200	<tomsmeding>	I'm not saying it's a bad idea
2024-04-29 22:59:02 +0200	<tomsmeding>	your heuristic might be very useful and applicable in practical cases
2024-04-29 22:58:39 +0200	shapr	thinks about that
2024-04-29 22:58:10 +0200	mei	(~mei@user/mei)
2024-04-29 22:57:50 +0200	cashew	(~cashewsta@65.17.175.150) (Ping timeout: 245 seconds)
2024-04-29 22:57:14 +0200	<tomsmeding>	shapr: https://math.stackexchange.com/a/111939/68044
2024-04-29 22:56:15 +0200	<tomsmeding>	(grr, --enable-coverage rebuilds the world again)
2024-04-29 22:55:46 +0200	mei	(~mei@user/mei) (Remote host closed the connection)
2024-04-29 22:55:30 +0200	<tomsmeding>	ah right
2024-04-29 22:55:17 +0200	<int-e>	(it's not QuickCheck because QuickCheck tests embed IO)
2024-04-29 22:55:08 +0200	<lambdabot>	Just 45
2024-04-29 22:55:07 +0200	<tomsmeding>	> find (\x -> round (2.000000001 ** fromIntegral (x :: Int) / 1e6) /= (round (2 ^ x / 1e6) :: Int)) [1..]
2024-04-29 22:54:19 +0200	<lambdabot>	Positive {getPositive = 58}
2024-04-29 22:54:19 +0200	<lambdabot>	*** Failed! Falsifiable (after 60 tests and 2 shrinks):
2024-04-29 22:54:17 +0200	<int-e>	@check \(Positive x) -> round (2.000000001 ** fromIntegral (x :: Int) / 1e6) == (round (2 ^ x / 1e6) :: Int)
2024-04-29 22:53:43 +0200	<tomsmeding>	but my point is that equalities can fail to hold without any conditionals, but seem to be fine for small inputs
2024-04-29 22:53:26 +0200	<tomsmeding>	clearly this is a constructed example
2024-04-29 22:53:16 +0200	<tomsmeding>	fails at x = 45
2024-04-29 22:53:08 +0200	<tomsmeding>	blegh, runs in my ghci
2024-04-29 22:53:00 +0200	<lambdabot>	with actual type ‘Positive Int -> Bool’