c-snippets  Hex Artifact Content

Artifact b71ea7749c9a7143212d15b6a5801d6b7e31cb91:

Wiki page [Toekin] by stephan 2008-02-25 09:14:26.
0000: 44 20 32 30 30 38 2d 30 32 2d 32 35 54 30 39 3a  D 2008-02-25T09:
0010: 31 34 3a 32 36 0a 4c 20 54 6f 65 6b 69 6e 0a 50  14:26.L Toekin.P
0020: 20 38 30 35 38 35 64 61 32 62 36 62 62 34 36 35   80585da2b6bb465
0030: 36 61 65 61 33 63 30 61 33 38 61 61 35 32 37 36  6aea3c0a38aa5276
0040: 65 38 35 62 31 61 64 63 38 0a 55 20 73 74 65 70  e85b1adc8.U step
0050: 68 61 6e 0a 57 20 31 31 32 38 0a 28 54 68 69 73  han.W 1128.(This
0060: 20 70 61 67 65 20 69 73 20 66 61 72 20 66 72 6f   page is far fro
0070: 6d 20 63 6f 6d 70 6c 65 74 65 2e 20 53 65 65 20  m complete. See 
0080: 74 68 65 20 3c 74 74 3e 74 6f 65 6b 69 6e 3c 2f  the <tt>toekin</
0090: 74 74 3e 20 73 75 62 64 69 72 20 6f 66 20 74 68  tt> subdir of th
00a0: 65 20 73 6f 75 72 63 65 20 74 72 65 65 20 66 6f  e source tree fo
00b0: 72 20 74 68 65 20 73 6f 75 72 63 65 73 2e 29 0d  r the sources.).
00c0: 0a 0d 0a 54 68 65 20 22 54 6f 65 6b 69 6e 22 20  ...The "Toekin" 
00d0: 28 70 72 6f 6e 6f 75 6e 63 65 64 20 22 74 6f 6b  (pronounced "tok
00e0: 65 6e 22 29 20 41 50 49 20 69 73 20 61 20 72 65  en") API is a re
00f0: 77 72 69 74 65 20 6f 66 20 61 6e 20 6f 6c 64 65  write of an olde
0100: 72 20 43 2b 2b 20 6d 69 6e 69 2d 70 72 6f 6a 65  r C++ mini-proje
0110: 63 74 2c 20 72 65 69 6d 70 6c 65 6d 65 6e 74 65  ct, reimplemente
0120: 64 20 69 6e 20 43 20 28 6d 61 69 6e 6c 79 20 61  d in C (mainly a
0130: 73 20 61 20 67 65 74 74 69 6e 67 2d 62 61 63 6b  s a getting-back
0140: 2d 74 6f 2d 43 20 65 78 65 72 63 69 73 65 29 2e  -to-C exercise).
0150: 20 49 74 20 69 73 20 67 65 61 72 65 64 20 74 6f   It is geared to
0160: 77 61 72 64 73 20 74 6f 6b 65 6e 69 7a 69 6e 67  wards tokenizing
0170: 20 73 63 72 69 70 74 2d 6c 69 6b 65 20 67 72 61   script-like gra
0180: 6d 6d 61 72 73 2e 20 57 68 69 6c 65 20 69 74 20  mmars. While it 
0190: 64 6f 65 73 20 6e 6f 74 20 63 6f 6e 74 61 69 6e  does not contain
01a0: 20 61 6e 79 20 61 63 74 75 61 6c 20 70 61 72 73   any actual pars
01b0: 69 6e 67 2f 69 6e 74 65 72 70 72 65 74 61 74 69  ing/interpretati
01c0: 6f 6e 20 63 6f 64 65 2c 20 74 68 65 20 74 6f 6b  on code, the tok
01d0: 65 6e 20 6d 6f 64 65 6c 20 73 68 6f 75 6c 64 20  en model should 
01e0: 62 65 20 66 6c 65 78 69 62 6c 65 20 65 6e 6f 75  be flexible enou
01f0: 67 68 20 74 6f 20 73 75 70 70 6f 72 74 20 61 20  gh to support a 
0200: 77 69 64 65 20 76 61 72 69 65 74 79 20 6f 66 20  wide variety of 
0210: 70 61 72 73 65 72 73 2f 69 6e 74 65 72 70 72 65  parsers/interpre
0220: 74 65 72 73 2e 20 46 6f 72 20 65 78 61 6d 70 6c  ters. For exampl
0230: 65 2c 20 65 78 70 65 72 69 6d 65 6e 74 61 74 69  e, experimentati
0240: 6f 6e 20 77 69 74 68 20 74 68 65 20 5b 68 74 74  on with the [htt
0250: 70 3a 2f 2f 77 77 77 2e 68 77 61 63 69 2e 63 6f  p://www.hwaci.co
0260: 6d 2f 73 77 2f 6c 65 6d 6f 6e 2f 7c 4c 65 6d 6f  m/sw/lemon/|Lemo
0270: 6e 20 70 61 72 73 65 72 20 67 65 6e 65 72 61 74  n parser generat
0280: 6f 72 5d 20 68 61 73 20 73 68 6f 77 6e 20 69 74  or] has shown it
0290: 20 74 6f 20 62 65 20 66 61 69 72 6c 79 20 73 74   to be fairly st
02a0: 72 61 69 67 68 74 66 6f 72 77 61 72 64 20 74 6f  raightforward to
02b0: 20 69 6e 74 65 67 72 61 74 65 20 54 6f 65 6b 69   integrate Toeki
02c0: 6e 20 77 69 74 68 20 4c 65 6d 6f 6e 2e 20 28 41  n with Lemon. (A
02d0: 6e 20 65 61 72 6c 69 65 72 20 76 65 72 73 69 6f  n earlier versio
02e0: 6e 20 6f 66 20 74 68 69 73 20 6d 6f 64 65 6c 20  n of this model 
02f0: 77 61 73 20 74 68 65 20 62 61 73 69 73 20 66 6f  was the basis fo
0300: 72 20 61 6e 20 65 78 70 65 72 69 6d 65 6e 74 61  r an experimenta
0310: 6c 20 73 63 72 69 70 74 69 6e 67 20 65 6e 67 69  l scripting engi
0320: 6e 65 20 77 69 74 68 20 61 20 63 75 73 74 6f 6d  ne with a custom
0330: 20 70 61 72 73 65 72 2c 20 77 68 69 63 68 20 65   parser, which e
0340: 76 65 6e 74 75 61 6c 6c 79 20 77 61 73 20 61 62  ventually was ab
0350: 61 6e 64 6f 6e 65 64 20 66 6f 72 20 6c 61 63 6b  andoned for lack
0360: 20 6f 66 20 61 20 64 65 63 65 6e 74 20 56 4d 2e   of a decent VM.
0370: 29 0d 0a 0d 0a 54 68 65 20 62 61 73 69 63 20 6d  )....The basic m
0380: 6f 64 65 6c 20 72 65 76 6f 6c 76 65 73 20 61 72  odel revolves ar
0390: 6f 75 6e 64 20 61 20 63 68 61 69 6e 20 6f 66 20  ound a chain of 
03a0: 74 6f 6b 65 6e 73 20 28 64 65 66 69 6e 65 64 20  tokens (defined 
03b0: 62 79 20 61 20 73 69 6d 70 6c 65 20 73 74 72 75  by a simple stru
03c0: 63 74 75 72 65 20 74 79 70 65 29 2c 20 77 68 65  cture type), whe
03d0: 72 65 20 61 20 63 68 61 69 6e 20 69 73 20 61 20  re a chain is a 
03e0: 64 6f 75 62 6c 79 2d 6c 69 6e 6b 65 64 20 6c 69  doubly-linked li
03f0: 73 74 20 6f 66 20 74 6f 6b 65 6e 73 2e 20 4f 6e  st of tokens. On
0400: 63 65 20 61 20 63 68 61 69 6e 20 69 73 20 69 6e  ce a chain is in
0410: 20 70 6c 61 63 65 2c 20 69 74 20 65 66 66 65 63   place, it effec
0420: 74 69 76 65 6c 79 20 61 73 20 61 6e 20 69 6e 66  tively as an inf
0430: 69 6e 69 74 65 2d 6c 6f 6f 6b 61 68 65 61 64 2f  inite-lookahead/
0440: 6c 6f 6f 6b 62 61 63 6b 20 62 75 66 66 65 72 2e  lookback buffer.
0450: 20 54 68 65 20 74 6f 6b 65 6e 69 6e 69 7a 65 72   The tokeninizer
0460: 20 69 74 73 65 6c 66 20 62 75 69 6c 64 73 20 75   itself builds u
0470: 70 20 74 68 65 73 65 20 63 68 61 69 6e 73 20 61  p these chains a
0480: 6e 64 20 70 72 6f 76 69 64 65 73 20 61 6e 20 41  nd provides an A
0490: 50 49 20 66 6f 72 20 6e 61 76 69 67 61 74 69 6e  PI for navigatin
04a0: 67 2c 20 6d 61 6e 69 70 75 6c 61 74 69 6e 67 2c  g, manipulating,
04b0: 20 61 6e 64 20 6d 61 6e 61 67 69 6e 67 20 74 68   and managing th
04c0: 65 6d 2e 0a 5a 20 33 61 35 34 31 64 33 35 39 66  em..Z 3a541d359f
04d0: 34 39 66 64 64 35 62 39 62 64 64 31 39 35 35 35  49fdd5b9bdd19555
04e0: 64 65 63 66 65 35 0a                             decfe5.