Area536 :: Rebooted

Nov 1, 2023 - 6 minute read - C64 gamedev

Musings on text input for Larry 64

Leisure Suit Larry originally featured a free text input system that allows the player to enter instructions for Larry to carry out. These instructions take the form of sentences in plain everyday English. Parsing and interpreting this so that a video game can make sense of it is quite a challenge. Can we teach a computer English? It seems so, but not one from the 1980’s. The Commodore 64 does not have the RAM nor does it have the CPU to completely parse English setence structures and grammar. But the good news is: neither did the PC on which Larry first came out. There are tricks involved!

Thanks to a number of AGI decompiler tools and repositories of old source code I was able to figure out a lot of how the original game worked. As it turns out, the game’s text parser recognizes 1084 distinct pieces of input to which it can respond in various ways. Those pieces of input mostly take the form of individual words like ’look’ or ‘smash’, but also complete phrases like ’establish eye contact’. The latter is also the longest single piece of input the engine accepts.

What is needed, is a way to get from free-format English words to something a simple CPU like the 6510 in the Commodore 64 can handle. Ideally I’d like to boil each item of input down to a single byte. Such a byte, being 8 bits long, can store 256 permutations that we could equate to a phrase.

Words and word groups

The original game’s 1084 pieces of input were divided across what the AGI engine calls ‘word groups’. A word group is a simple list of synonyms: any word that is found in a specific group is treated exactly the same as all the other words in the group. This leads to the situation where you can tell Larry to “RAISE THE LOVELY DOORWAYS” and it would be interpreted the exact same way as “OPEN DOOR”. Your character will dutifully open the door to Lefty’s glamorous establishment and enter the place.

The reason why these phrases are identical comes down to word groups. The word “RAISE” is in the same group as “OPEN”. They get treated identically. The words “THE” and “LOVELY” are in the special word group that gets completely ignored by the game, so you can type them all you want but it’ll be as if they weren’t there at all. What remains is the word “DOORWAYS”, which is in a group together with “DOOR”, “DOORS”, and “DOORWAY”.

Ignorance is bliss

The number of words in the “to be ignored” group is comparatively huge with almost 100 words that the game won’t even consider. This group exists for the game to be a little more intelligent in its responses when compared to words that it really does not know the first thing about.

A word like “PIANO” is not in the game at all. The interpreter may quip that you’re never going to need the word “PIANO” to win this game. The word “WHOOPIE”, on the other hand, is on the ignore list. Now you’ll never need that word to finish the game either, but the interpreter will be silent on your use of it.

Seeing as we have only a tiny amount of RAM to work with and splitting the screen modes severely limits the time we have for computation, a fairly simple decision is to be made. I ditch the entire first word group and simply ignore everything that’s not explicitly in some other word group. I don’t really care about the difference between “PIANO” and “WHOOPIE” anyway. The interpreter can be just as snarky to you about both. It saves the need to parse through almost 100 unnecessary words.

The other word groups also contain synonyms that aren’t all that relevant and can be scrapped in favor of freeing more memory and compute cycles. I’m not up to that point yet, although I’m likely to not include the word “EXPECTORATE” in my version of the game. You can say “SPIT”, right?

I’m allocating a single byte for the word group. Being 8 bits wide a single byte won’t accommodate all of the 300+ word groups of the original game at the same time. Fortunately not all scenes need to have all word groups present at all times. For example you don’t be ordering drinks anytime soon when you’re not around any kind of bar. Similarly you won’t be using words related to card games when you’re not gambling in a casino.

By coding the input tokenizing separately for each scene there’s likely to be some amount of duplication between scenes, but that won’t be an issue on a cartridge game. The cartridge memory comes in large blocks anyway, and I’m expecting to fill those blocks with graphical assets long before I run out of word groups for the tokenizer.

Building phrases

The input size is limited by the on-screen editor to 40 minus the leading ] and the cursor character so 38 characters at most. The minimum length of a word is 2 characters and a single space delimits words. That means that, at the utmost, a line can contain 13 words that could be turned into individual tokens. Such a line is extremely hypothetical and nonsensical really, so I’m aiming at a more realistic maximum of 8 words on a line.

Assuming that all 8 words actually carry meaning in the game, that would translate into 8 tokens of up to 2 bytes each so we’ll need 16 bytes maximum to store any 8-word phrase the game can throw at us.

Most phrases will take a much simpler form where a single verb is paired with a single noun, like “OPEN DOOR”. Sure you could write “OPEN THE POD BAY DOORS HAL”, but since we’ll be ignoring “THE”, “POD”, “BAY” and “HAL” straight away and “DOORS” lives in the same word group as “DOOR”, that would quickly turn into “OPEN DOOR”: only 2 tokens, which can be relatively quickly matched against to determine the next steps for the game to take.

Bringing the tokens and the responses together

The game will contain a collection of phrases that will elicit a legitimate response. These collections take the form of a sequence of tokens that we run through to match what the tokenizer gathered from the user’s input.