Path: news.daimi.aau.dk!avs From: avs@daimi.aau.dk (Alexandre Valente Sousa) Newsgroups: daimi.beta,comp.lang.beta Subject: Lexical Analyzer for Beta Date: 16 Nov 1994 18:13:36 GMT Organization: DAIMI, Computer Science Dept. at Aarhus University Lines: 152 Message-ID: <3adi4g$9vn@belfort.daimi.aau.dk> NNTP-Posting-Host: chrysler.daimi.aau.dk Summary: Lexical analyzer generator for Mjolner Beta Keywords: lexical analyzer,scanner,flex Xref: news.daimi.aau.dk daimi.beta:169 comp.lang.beta:153 For those using the Mjolner Beta Metaprogramming System (MPS) there is a new lexical analyzer generator available. It is only a lexical analyzer (scanner), there is NO parser! Sorry. The motivation for doing such a thing was that there are some limitations in the MPS in what concerns what can be considered a token. E.g. it is not possible to redefine the lexem NameDecl so that it covers Standard ML identifiers with embedded primes (i.e. Alpha is OK but Alpha' is not). This is due to the fact that the MPS does not use a separate lexical analyzer module but a built-in definition of the lexemes (terminal symbols). **** The lexical analyzer generator (called flexgen) works as follows: 0- the hard work of generating a lexical analyzer is made by flex(1), this is free software distributed by the Free Software Foundation and which is expected to replace lex in the next BSD Unix distribution. Flexgen only generates code for flex 1- it takes as input a metagrammar (MPS syntax) 2- if the user wants support for tokens unsupported by the MPS then it is necessary to add in the OPTION section of the metagrammar a regular expression defining the new token, e.g. -- foo : agrammar : metagrammar -- OPTION Tokens = ( Identifier '[a-zA-Z][a-zA-Z0-9''_]*' ) RULE ::= 'foo' ::+ 'bart' 3- the above metagrammar is ALMOST accepted by the current MPS because options that the MPS doesn't recognize are silently ignored. There is only a problem with because it is not further specified, i.e. it is a terminal instead of a non-terminal. This will cause an error. For the time being one can just ignore such an error because anyway you cannot use the MPS parser (at this moment flexgen is not recognized by the MPS). Hopefully someday there will be some (special?) version of the MPS that can use flexgen. 4- generate the lexical analyzer: flexgen foo-meta.gram this generates a makefile and C, Beta and Flex code that implements the lexical analyzer. There is a Beta API to the lexical analyzer thus you can make a Beta program that uses it (using calls of the kind getNextToken, keepComments, getCurrentString, getLineNumber, ...) 5- when you compile your Beta program the generated makefile will be called by the Beta compiler and this will call flex which will generate C code which will be compiled and linked Of course this is not very useful because usually one wants the services of both a lexical analyzer and a parser. Still that's all that exists at the moment. **** Advanced features: - any token that can be represented by an extended regular expression (see man flexdoc(1)) can be scanned. There are a few pathological expressions that cannot be scanned, see man flexdoc(1). - possibility of discarding lexemes (i.e. lexemes that will never be seen by getNextToken). This could be used for instance to remove debug statements from your code - support for most flex features, that is, if you know flex (or lex) there are hooks for adding arbitrary C code that will be executed when that token is recognized. Start conditions (inclusive and exclusive) are also supported - nested comments - Beta inputCallback, i.e. you supply the function that supplies the linear stream of characters that are to be scanned into tokens - high performance. No attempt was made to measure it against the MPS, still the use of flex which generates high performance scanners allows me to think that performance can be at least one order of magnitude higher for long input files **** The current status is: - version 0.1 alpha. This version seems to be pretty stable (but no one used it except me). Notice that flexgen is not a complex tool (complexity is handled by flex) thus I don't anticipate any serious bugs that cannot be easily fixed. The generated lexical analyzer requires version 5 of the Beta compiler - was tested under the following systems: Sun-4 Sparc SunOS 4.x Sun-4 Sparc Solaris 2.x HP9000 series 300/400 HP-UX 8.x HP9000 series 700 HP-UX 9.x PC386 PC286 - source code is supplied. You should be able to build it in any Unix system that has gmake, g++, flex, yacc, rcs, csh. Because of a bootstrap problem (how to scan things that could not be scanned by the current MPS) flexgen was not written in Beta (it just generates Beta code). If you have the above software and you want to build flexgen in an Unix system supported by the Mjolner Beta System then all you have to do is: mkdir flexgen cd flexgen gzip -dc <...>/flexgen.tar.gz | tar xvf - gmake There is also a makefile to build it in a PC with Dos 6.x and DJGPP Your target system must have the Mjolner Beta System 3.x and a C compiler - for those using the Mjolner Beta System on the Mac there are 3 options: to cross-generate using an Unix system; to cross-generate using a PC system; to locally generate using Insignia Solutions SoftPC (which emulates a PC 286 in a Mac). The generated lexical analyzer uses hybrid C/C++ code, i.e. if you only have a C compiler available then it will be compiled as C code. I can supply binaries for PCs if you are interested (the 32 bits version uses the DJGPP port of the GNU tools, the 16 bits version uses a commercial C++ compiler) - documentation: a) readme.txt (release notes) b) flexgen.tex, flexgen.ps (50 pages manual including appendices, index and a flex tutorial) c) flexgen(1) (Unix man page) d) 4 demo grammars and 2 demo programs - legal issues: although flex is distributed by the Free Software Foundation it is not covered by the same GNU licence as Bison, i.e. you can use flex (and flexgen) to generate either free or commercial software - availability: a) if you have access to daimi.aau.dk then just grab the file /users/avs/public/flexgen.tar.gz (250 KBytes) and extract and read 'readme.txt' b) if you don't, then send me a mail and I will arrange anonymous ftp access to it - future work: it is expected that someday the MPS will be able to use the Beta API offered by flexgen. When that happens I will port flexgen so that it runs natively on the Macintosh **** If you use it please let me know, specially I would love to hear of any bugs, documentation errors, undocumented features or problems of any kind. Alexandre