2024/11/11

Newest at the top

2024-11-12 00:54:39 +0100 <glguy> There will be some class of characters you don't mind ending on: whitespace, (, ), etc.
2024-11-12 00:54:18 +0100 <glguy> notFollowedBy or using lookAhead (same idea) to check that you're OK with the boundary that you ended on
2024-11-12 00:52:43 +0100 <glguy> that wouldn't necessarily use parsec. If you want to do it in parsec I expect you'll have to use notFollowedBy to detect that your token ended on an OK boundary
2024-11-12 00:50:44 +0100 <glguy> You have to write a program that processes the string using the rules you have in mind; there isn't a shortcut
2024-11-12 00:50:22 +0100 <glguy> you processed the string turning it into tokens until you got to the end of the string
2024-11-12 00:49:50 +0100 <fp> Sure but how did that check the whole string?
2024-11-12 00:49:28 +0100 <glguy> by the thing that turned it into tokens
2024-11-12 00:49:20 +0100 <glguy> if you had it set up with tokens then that's already done
2024-11-12 00:48:46 +0100 <fp> I guess the question become, if I have it set up with tokens, how do I write the parser so that it checks if the whole token matches instead of just the beginning
2024-11-12 00:48:24 +0100Everything(~Everythin@46.211.220.37) (Quit: leaving)
2024-11-12 00:44:17 +0100 <glguy> but if you don't want to completely rethink your design maybe spend some time looking at "notFollowedBy" and hack it together
2024-11-12 00:44:08 +0100Tuplanolla(~Tuplanoll@91-159-69-59.elisa-laajakaista.fi) (Ping timeout: 244 seconds)
2024-11-12 00:43:48 +0100 <c_wraith> You need to identify a sequence of characters as a single token, and then check that the token is valid *as a token*
2024-11-12 00:43:24 +0100 <c_wraith> you're fundamentally asking about a tokenizing issue
2024-11-12 00:43:22 +0100 <glguy> and then you'd try to turn it into one and find it had invalid characters
2024-11-12 00:43:14 +0100jinsun(~jinsun@user/jinsun) (Ping timeout: 248 seconds)
2024-11-12 00:43:12 +0100 <glguy> it would because you'd get "#b0110a" as a token that you'd try to process and you'd decide it needs to be a binary number literal because of the first two characters
2024-11-12 00:43:02 +0100 <fp> Or will it?
2024-11-12 00:42:25 +0100 <fp> But right now I'm really just trying to get this to work against single tokens. My test string is '#b0110a', which tokenization won't help
2024-11-12 00:40:36 +0100 <glguy> If you're doing a lisp your tokens might be something like, '(' ')' and sequences of stuff that's delimited by whitespace
2024-11-12 00:40:23 +0100 <c_wraith> oh, then yeah. tokenize and parse separately
2024-11-12 00:39:53 +0100 <fp> or 234
2024-11-12 00:38:59 +0100 <glguy> Parsec is parameterized to work over an arbitrary stream of arbitrary tokens
2024-11-12 00:38:54 +0100 <lambdabot> have
2024-11-12 00:38:54 +0100 <lambdabot> Variable not in scope:
2024-11-12 00:38:54 +0100 <lambdabot> error:
2024-11-12 00:38:53 +0100 <fp> The issue is that if I have #b01234, it will parse #b01 as a valid number, and then it'll parse 1234 as a valid number
2024-11-12 00:38:53 +0100 <fp> > have a separate parser for each radix. Only accept characters that radix uses
2024-11-12 00:38:36 +0100 <glguy> Ideally you'd process your input string into lexical tokens first and then use parsec over those instead of characters
2024-11-12 00:38:19 +0100falafel(~falafel@2600:1700:99f4:2050:c99f:7c1:9343:9cff) falafel
2024-11-12 00:37:56 +0100sawilagar(~sawilagar@user/sawilagar) (Ping timeout: 244 seconds)
2024-11-12 00:37:53 +0100falafel(~falafel@2600:1700:99f4:2050:c99f:7c1:9343:9cff) (Quit: Leaving)
2024-11-12 00:37:29 +0100 <glguy> Parsec doesn't make it particularly easy to handle these cases, but it's possible
2024-11-12 00:37:25 +0100 <c_wraith> have a separate parser for each radix. Only accept characters that radix uses
2024-11-12 00:37:04 +0100 <glguy> Maybe you want https://hackage.haskell.org/package/parsec-3.1.17.0/docs/Text-Parsec-Combinator.html#v:notFollowedBy
2024-11-12 00:35:02 +0100 <fp> And the point here is just to learn haskell, and I think there's probably some knowledge I'm missing that would allow me to reason about this problem better
2024-11-12 00:34:33 +0100 <probie> Since something like `1+` or `a+b` are normally valid identifier names
2024-11-12 00:33:42 +0100 <fp> yeah
2024-11-12 00:33:37 +0100 <probie> You probably want to reject it for a lisp
2024-11-12 00:33:13 +0100 <Axman6> D:
2024-11-12 00:33:01 +0100 <lambdabot> 1 + x
2024-11-12 00:33:00 +0100 <glguy> > (+) 1x :: Expr
2024-11-12 00:32:54 +0100 <glguy> fp: are you sure you need to worry about it? Haskell doesn't
2024-11-12 00:32:47 +0100 <Axman6> do you want #b10101foo to be valid, and parse 42 and foo?
2024-11-12 00:32:43 +0100 <fp> not =binDigit=
2024-11-12 00:32:25 +0100 <Axman6> so what is the a in that example string?
2024-11-12 00:32:07 +0100 <fp> also this removeChar thing is super hacky and feels wrong
2024-11-12 00:31:44 +0100 <fp> =endBy1 binDigit (choice [removeChar <$> space, removeChar <$> symbol, eof])= is almost what I want (where removeChar :: Char -> ()), but it demands the latter expression be a separator
2024-11-12 00:31:29 +0100 <Axman6> I think your message got cut off, last I see is "<$> symbol,"
2024-11-12 00:30:26 +0100 <fp> Hey I'm trying to parse numbers with Parsec, and I'm a bit stuck (this is for the 48h scheme tutorial). I'm trying to parse numbers of various radix and I need to avoid accepting numbers where the first digits are valid in the radix, but later digits aren't, e.g. #b010001a. The regex for what I want is /#b[01]+\b/, but I'm struggling to work out how to implement the \b. =endBy1 binDigit (choice [removeChar <$> space, removeChar <$> symbol,