Game Development Community

dev|Pro Game Development Curriculum

Incorporating a parser into your project

by Joel Baxter · 01/12/2002 (8:38 am) · 2 comments

A few things to get out of the way first:

What is this about?

These are instructions for using .l (tokenizer definition) and .y (parser definition) files in your TGE-related project. I'm going to briefly mention a couple of things about the files themselves, then discuss how to incorporate them into VC++, and finally make some suggestions (untested) about incorporating them into the TGE makefile system, for people on other platforms or who do not use VC++ in Windows.

(Note: If you want to figure out how to change and rebuild the existing console parser in TGE, you should first see the HowTo on Recompiling changes to the parser grammar.)

I'm going to be discussing how to do the VC++ setup "from scratch". However, there does exist a relevant wizard for VC++, the BisonFlex Wizard. It's available from many download locations, including downloads.cnet.com. You may want to try it out... make sure though that you get version 1.6, not version 1.5. (Curiously, the author's homepage only has a link to version 1.5 as of this writing.) After installing that wizard, you can create a skeleton parser project in VC++ with File->New->Lexer And Parser AppWizard. However, I haven't used that wizard myself, and I'm not sure how useful it will be if you want to incorporate a parser into an existing project. And if you run into problems with it, the low-level instructions below may help resolve them.

It's also worth noting that the BisonFlex Wizard comes packaged with Windows ports of flex (.l file compiler) and bison (.y file compiler). (These are Wilbur Streett's ports.) Even if you end up not wanting to use the wizard, you may want to use those tools; they may be a little older than the latest available versions, but should still be quite serviceable. Which leads me to...


What tools do I need?

You need compilers for .l and .y files. The traditional UNIX tools for this purpose are lex and yacc. However, I'll specifically be talking about the GNU tools flex and bison; personally I use the versions of those tools that come with Cygwin.

Note for anyone worried about using GNU tools: as of version 1.24, the output of bison can be used in non-free software. The output of flex has always been OK for non-free software.

There are other lex-like and yacc-like tools available for Windows. If you use non-Cygwin versions of flex and bison, you can probably follow the procedure below fairly closely, but you may have to change some things. If you use other tools entirely, they may diverge more from what I'll be describing.

The various lex-like and yacc-like tools have varying levels of compatibility with C++ projects. One of the reasons I'm using flex and bison is that their output can happily coexist in C++-land. This may not be true for some other tools (but that issue is entirely outside the scope of this HowTo).


What else do I need to know?

You should be familiar with the way that your tools name their output files, and with the command-line arguments for your tools. Especially if you're not using Cygwin flex and bison. Otherwise you will undoubtedly run into problems when it becomes necessary to adapt the below instructions for your particular toolset. Things to watch out for: command-line argument syntax, output file naming conventions, and perhaps whether the tool expects path arguments in its command line to use forward slashes or back slashes. I will not take the space to mention all these potential differences each time I show a command line, so just keep them in mind if you are using different tools. I'll describe what a command line is doing, and it will be up to you to determine the correct syntax to achieve that effect with your tools.

You also need to know how to write .l and .y files. This HowTo stays away from that topic entirely. It's only about incorporating those files into your project.


OK, onward!


Credit where credit is due

My starting point for getting .l and .y assimilated into VC++ was a Yahoo groups post, which in turn was apparently repeating suggestions from Wilbur Streett.

Also, Bryan Ross told me about the BisonFlex Wizard in the GG forums.


Files setup

1) For each .l and .y file pair that is supposed to "go together", make sure they have the same filename base.

Rationale: This is generally a good idea just to show that the files are related. Also, though, it helps when it comes time to specify a generic "how to compile this file" rule (which will be discussed later). The files generated from .l and .y files by default use a lot of standard function and variable names that begin with the prefix yy. If you have multiple parsers in your project, this will be a problem as they each will be using the same set of names; so, you'll want to give flex/bison command-line arguments to change those names to use some other prefix, specific to each parser. The easiest choice is to use the filename base as the prefix. So, for example, if the files are named foo.l and foo.y, their generated files will define and use variables named footext and foolval instead of yytext and yylval, function fooparse instead of yyparse, etc.

2) Keep the above-mentioned naming convention in mind when writing your .l and .y files. Flex/bison will automatically create macros that allow you to still refer to some of those names using the old yy-prefix name, if the reference is inside the file where that name is defined, but this doesn't cover all situations where you might want to use such names, especially if you are referencing them from outside the .l and .y files. So if your compile fails because some name with a yy prefix is undefined, you probably need to change that name to use the filename base as a prefix instead.

3) In the .l file, specify "%option never-interactive" among the options at the top of the file. You may also want "%option noyywrap" unless you need to define your own custom yywrap function.


VC++ setup

1) If you have the BisonFlex Wizard installed, and you are using its flex/bison tools, you can skip this step. Otherwise, go to Tools->Options->Directories, pick Executable Files pane, and add the directory where flex and bison are. For Cygwin this will probably be C:\CYGWIN\BIN. If there's the possibility of multiple versions of flex/bison existing on your system, move the directory you just added to the top of the list, to make sure it's the tools in that folder that get used.

2) Create an empty file named unistd.h file among the VC++ header files. (This may not be necessary for some .l/.y compiler tools, if they have been specifically made for Windows.) If you're uneasy with touching any of the VC++ folders, then put it in some other directory, and use the Include Files pane of Tools->Options->Directories to specify its location.

3) In your project, add the .l and .y files to the list of source files. You may also want to right-click on the folder that contains the source files, choose Properties, and add .l and .y to its list of file extensions.

4) Right-click on the .l file in the list of source files, and choose Settings.

5) In the Commands field for Custom Build, enter:
flex -L -P$(InputName) -o$(InputDir)/$(InputName).cc $(InputPath)
Here's what that command line is doing:
- The -L flag suppresses the generation of linenumber directives (which are used so that syntax errors in the generated file can be matched up with lines in the source file). Some tools generate linenumber directives that the VC++ compiler is happy with, and some don't. My experience is that Cygwin flex output actually compiles OK with linenumber directives included, but Cygwin bison output does not. You can try removing this flag and see if things still compile.
- The -P flag handles changing the yy-prefix names to use the filename base (InputName) as a prefix instead.
- The -o flag specifies the output file. It should probably be placed in the same directory as the input file. It should have the same filename base, but with the .cc extension.
You can also add other flex options in there, of course; personally I'm currently also using "-i" so that the parser will be case-insensitive.

6) In the Outputs field, enter:
$(InputDir)$(InputName).cc
This matches the generated output file as specified in the flex command line. The path here uses a backslash to follow Windows pathname conventions.

7) Click OK to dismiss the Settings dialog.

8) Right-click on the .y file in the list of source files, and choose Settings.

9) In the Commands field for Custom Build, enter these two lines:
bison -d -l -p $(InputName) $(InputPath)
move $(InputDir)$(InputName).tab.c $(InputDir)$(InputName).tab.cc
Here's what the bison command line is doing:
- The -d flag generates a header file that defines the token values. Most parser projects require this flag, so that the tokenizer can #include that file.
- The -l flag is the linenumber suppression thing again.
- And, the -p flag is for changing the yy-prefix names.
Cygwin bison generates a .tab.c file, and I want it to have a .cc extension, so I added the Windows move command on the second line, to rename the file. Note that tools other than Cygwin bison may generate a file with a _tab.c suffix instead (replacing the first period with an underscore), so you would have to change the move arguments appropriately if so.

10) In the Outputs field, enter these two lines:
$(InputDir)$(InputName).tab.cc
$(InputDir)$(InputName).tab.h
The .tab.h file is only generated if you have specifically asked for it to be generated (with the -d flag in the command line from the previous step). If for some reason you didn't generate it, you wouldn't enter its name in the Outputs field. Also, same warning as before, some tools may generate a _tab.h file instead.

11) Repeat steps 4-10 for all .l and .y files you are adding to the project.

12) For each .y file, right-click on it in the list of source files, and choose Compile. If there are errors, resolve them; don't proceed until you have successfully compiled each .y file.

13) For each .l file, right-click on it in the list of source files, and choose Compile. If there are errors, resolve them; don't proceed until you have successfully compiled each .l file.

14) The compilation of each .y file should have generated a tab.h file with the same basename (if you use the -d flag mentioned in step 9) and a tab.cc file with the same basename. The compilation of each .l file should have generated a new .cc file with the same basename. Add all these generated files to the list of source files for the project. (If there's a separate folder for header files, then of course add the generated tab.h file there instead.)


Makefile setup

Caveat hax0r: I haven't used the TGE SDK makefiles system to build anything yet, so I don't know if this works. However, if you want to modify the makefiles to be able to build a project that includes .l and .y files, try this:

1) In torque/mk/conf.common.mk, add rules for how to compile .l and .y files. Maybe like this:
%.tab.cc %.tab.h : %.y
	bison -d -l -p $(notdir $(basename $(<))) $(<)
	$(CP) $(basename $(<)).tab.c $(basename $(<)).tab.cc
	$(RM) $(basename $(<)).tab.c

%.cc : %.l
	flex -L -P$(notdir $(basename $(<))) -o$(basename $(<)).cc $(<)
You can see that I've named "bison" and "flex" directly rather than using some platform-specific compiler variables (DO.COMPILE.whatever). Partly out of laziness, and partly because flex/bison, being platform-agnostic tools, will probably have the same name regardless of platform. However feel free of course to add platform-specific compiler variables for .l and .y compilers to all the relevant platform .mk files, and then use those variables here.

2) The targets.projectname.mk file for your project will include a list of .cc files. Add the names of the generated .cc and .tab.cc files to that list. I don't think that the .l and .y files should be added to that list.

3) Also in the targets.projectname.mk file, add dependence specifications for your .l and .y files, maybe like this:
[i]path[/i]/[i]basename[/i].tab.cc [i]path[/i]/[i]basename[/i].tab.h: [i]path[/i]/[i]basename[/i].y
[i]path[/i]/[i]basename[/i].cc: [i]path[/i]/[i]basename[/i].l [i]path[/i]/[i]basename[/i].tab.h
Where path is the path to the source files (you can see what this is by looking at the list of .cc source files) and basename is whatever filename base you used for your .l and .y files.

If you're using different tools, all the previously mentioned caveats about the command line and output file naming of course apply to the makefile rules and dependence specifications as well. The examples above were specifically written with Cygwin flex/bison in mind.

Like I said, I haven't tried using the makefile system, so I wouldn't be surprised if this isn't quite correct. :-) I'll make corrections if/when I have the occasion to try it out -- also, I'd certainly be interested in hearing corrections from anyone else who gets this sort of compilation working using the makefiles.

#1
01/10/2002 (8:40 pm)
:thumbsup:
#2
01/16/2002 (4:24 pm)
BTW, if anyone has specific directions on how to do something similar on the Mac with the provided CodeWarrior or Project Builder environments, please post.