Newest at the top
2024-11-12 00:54:39 +0100 | <glguy> | There will be some class of characters you don't mind ending on: whitespace, (, ), etc. |
2024-11-12 00:54:18 +0100 | <glguy> | notFollowedBy or using lookAhead (same idea) to check that you're OK with the boundary that you ended on |
2024-11-12 00:52:43 +0100 | <glguy> | that wouldn't necessarily use parsec. If you want to do it in parsec I expect you'll have to use notFollowedBy to detect that your token ended on an OK boundary |
2024-11-12 00:50:44 +0100 | <glguy> | You have to write a program that processes the string using the rules you have in mind; there isn't a shortcut |
2024-11-12 00:50:22 +0100 | <glguy> | you processed the string turning it into tokens until you got to the end of the string |
2024-11-12 00:49:50 +0100 | <fp> | Sure but how did that check the whole string? |
2024-11-12 00:49:28 +0100 | <glguy> | by the thing that turned it into tokens |
2024-11-12 00:49:20 +0100 | <glguy> | if you had it set up with tokens then that's already done |
2024-11-12 00:48:46 +0100 | <fp> | I guess the question become, if I have it set up with tokens, how do I write the parser so that it checks if the whole token matches instead of just the beginning |
2024-11-12 00:48:24 +0100 | Everything | (~Everythin@46.211.220.37) (Quit: leaving) |
2024-11-12 00:44:17 +0100 | <glguy> | but if you don't want to completely rethink your design maybe spend some time looking at "notFollowedBy" and hack it together |
2024-11-12 00:44:08 +0100 | Tuplanolla | (~Tuplanoll@91-159-69-59.elisa-laajakaista.fi) (Ping timeout: 244 seconds) |
2024-11-12 00:43:48 +0100 | <c_wraith> | You need to identify a sequence of characters as a single token, and then check that the token is valid *as a token* |
2024-11-12 00:43:24 +0100 | <c_wraith> | you're fundamentally asking about a tokenizing issue |
2024-11-12 00:43:22 +0100 | <glguy> | and then you'd try to turn it into one and find it had invalid characters |
2024-11-12 00:43:14 +0100 | jinsun | (~jinsun@user/jinsun) (Ping timeout: 248 seconds) |
2024-11-12 00:43:12 +0100 | <glguy> | it would because you'd get "#b0110a" as a token that you'd try to process and you'd decide it needs to be a binary number literal because of the first two characters |
2024-11-12 00:43:02 +0100 | <fp> | Or will it? |
2024-11-12 00:42:25 +0100 | <fp> | But right now I'm really just trying to get this to work against single tokens. My test string is '#b0110a', which tokenization won't help |
2024-11-12 00:40:36 +0100 | <glguy> | If you're doing a lisp your tokens might be something like, '(' ')' and sequences of stuff that's delimited by whitespace |
2024-11-12 00:40:23 +0100 | <c_wraith> | oh, then yeah. tokenize and parse separately |
2024-11-12 00:39:53 +0100 | <fp> | or 234 |
2024-11-12 00:38:59 +0100 | <glguy> | Parsec is parameterized to work over an arbitrary stream of arbitrary tokens |
2024-11-12 00:38:54 +0100 | <lambdabot> | have |
2024-11-12 00:38:54 +0100 | <lambdabot> | Variable not in scope: |
2024-11-12 00:38:54 +0100 | <lambdabot> | error: |
2024-11-12 00:38:53 +0100 | <fp> | The issue is that if I have #b01234, it will parse #b01 as a valid number, and then it'll parse 1234 as a valid number |
2024-11-12 00:38:53 +0100 | <fp> | > have a separate parser for each radix. Only accept characters that radix uses |
2024-11-12 00:38:36 +0100 | <glguy> | Ideally you'd process your input string into lexical tokens first and then use parsec over those instead of characters |
2024-11-12 00:38:19 +0100 | falafel | (~falafel@2600:1700:99f4:2050:c99f:7c1:9343:9cff) falafel |
2024-11-12 00:37:56 +0100 | sawilagar | (~sawilagar@user/sawilagar) (Ping timeout: 244 seconds) |
2024-11-12 00:37:53 +0100 | falafel | (~falafel@2600:1700:99f4:2050:c99f:7c1:9343:9cff) (Quit: Leaving) |
2024-11-12 00:37:29 +0100 | <glguy> | Parsec doesn't make it particularly easy to handle these cases, but it's possible |
2024-11-12 00:37:25 +0100 | <c_wraith> | have a separate parser for each radix. Only accept characters that radix uses |
2024-11-12 00:37:04 +0100 | <glguy> | Maybe you want https://hackage.haskell.org/package/parsec-3.1.17.0/docs/Text-Parsec-Combinator.html#v:notFollowedBy |
2024-11-12 00:35:02 +0100 | <fp> | And the point here is just to learn haskell, and I think there's probably some knowledge I'm missing that would allow me to reason about this problem better |
2024-11-12 00:34:33 +0100 | <probie> | Since something like `1+` or `a+b` are normally valid identifier names |
2024-11-12 00:33:42 +0100 | <fp> | yeah |
2024-11-12 00:33:37 +0100 | <probie> | You probably want to reject it for a lisp |
2024-11-12 00:33:13 +0100 | <Axman6> | D: |
2024-11-12 00:33:01 +0100 | <lambdabot> | 1 + x |
2024-11-12 00:33:00 +0100 | <glguy> | > (+) 1x :: Expr |
2024-11-12 00:32:54 +0100 | <glguy> | fp: are you sure you need to worry about it? Haskell doesn't |
2024-11-12 00:32:47 +0100 | <Axman6> | do you want #b10101foo to be valid, and parse 42 and foo? |
2024-11-12 00:32:43 +0100 | <fp> | not =binDigit= |
2024-11-12 00:32:25 +0100 | <Axman6> | so what is the a in that example string? |
2024-11-12 00:32:07 +0100 | <fp> | also this removeChar thing is super hacky and feels wrong |
2024-11-12 00:31:44 +0100 | <fp> | =endBy1 binDigit (choice [removeChar <$> space, removeChar <$> symbol, eof])= is almost what I want (where removeChar :: Char -> ()), but it demands the latter expression be a separator |
2024-11-12 00:31:29 +0100 | <Axman6> | I think your message got cut off, last I see is "<$> symbol," |
2024-11-12 00:30:26 +0100 | <fp> | Hey I'm trying to parse numbers with Parsec, and I'm a bit stuck (this is for the 48h scheme tutorial). I'm trying to parse numbers of various radix and I need to avoid accepting numbers where the first digits are valid in the radix, but later digits aren't, e.g. #b010001a. The regex for what I want is /#b[01]+\b/, but I'm struggling to work out how to implement the \b. =endBy1 binDigit (choice [removeChar <$> space, removeChar <$> symbol, |