ECE391: Computer Systems Engineering Fall 2021
Machine Problem 2
Midterm 1: Tuesday 28 September Checkpoint 1: Tuesday 5 October Checkpoint 2: Tuesday 12 October
Device, Data, and Timing Abstractions
Read the whole document before you begin, or you may miss points on some requirements, such as the bug log.
In this machine problem, you will extend a video game consisting of about 5,000 lines of code with additional graphical features and a serial port device. The code for the game is reasonably well-documented, and you will need to read and understand the code in order to succeed, thus building your ability to explore and comprehend existing software systems. Most code that you will encounter is neither as small nor as well documented—take a look at some of the Linux sources for comparison—but this assignment should help you start to build the skills necessary to extend more realistic systems. As your effort must span the kernel/user boundary, this assignment will also expose you to some of the mechanisms used to manage these interactions, many of which we will study in more detail later in the course. Before discussing the tasks for the assignment, let’s discuss the skills and knowledge that we want you to gain:
• Learn to write code that interacts directly with devices.
• Learn to abstract devices with system software.
• Learn to manipulate bits and to transform data from one format into another. • Learn the basic structure of an event loop.
• Learn to use a mutex with the pthread API.
Device protocols: We want you to have some experience writing software that interacts directly with devices and must adhere to the protocols specified by those devices. Similar problems arise when one must meet software interface specifications, but you need experience with both in order to recognize the similarities and differences. Unfortunately, most of the devices accessible from within QEMU have fully developed drivers within Linux. The video card, however, is usually managed directly from user-level so as to improve performance, thus most of the code is in other software packages (such as XFree86). We are also fortunate to have a second device designed by and , two previous staff members. The device is a game controller called the Tux Controller (look at the back of the board) that attaches to a USB port. You can find one on each of the machines in the lab. On the Tux Controller board is an FTDI “Virtual Com Port” (VCP) chip, which together with driver software in Windows makes the USB port appear to as an RS232 serial port. QEMU is then configured to map a QEMU-emulated serial port on the virtual machine to the VCP-emulated serial port connected to the Tux Controller. In this assignment, you will write code that interacts directly with both the (emulated) video card and the game controller board.
Device abstraction: Most devices implement only part of the functionality that a typical user might associate with them. For example, disk drives provide only a simple interface through which bits can be stored and retrieved in fixed-size blocks of several kB. All other functionality, including everything from logical partitions and directories to variable-length files and file-sharing semantics, is supported by software, most of which resides in the operating system. In this machine problem, you will abstract some of the functionality provided by the Tux controller board.
Format interchange: This machine problem gives you several opportunities for working with data layout in memory and for transforming data from one form to another. Most of these tasks relate to graphics, and involve taking bit vectors or pixel data laid out in a form convenient to C programmers and changing it into a form easily used by the Video Graphics Array (VGA) operating in mode X. Although the details of the VGA mode X layout are not particularly relevant today, they do represent the end product of good engineers working to push the limits on video technology. If you work with cutting-edge systems, you are likely to encounter situations in which data formats have been contorted for performance or to meet standards, and you are likely to have to develop systems to transform from one format to another.
Event loops: The idea of an event loop is central to a wide range of software systems, ranging from video games and discrete event simulators to graphical user interfaces and web servers. An event loop is not much different than a state
machine implemented in software, and structuring a software system around an event loop can help you to structure your thoughts and the design of the system. In this machine problem, the event loop is already defined for you, but be sure to read it and understand how the implementation enables the integration of various activities and inputs in the game.
Threading: Multiple threads of execution allow logically separate tasks to be executed using synchronous operations. If one thread blocks waiting for an operation to complete, other threads are still free to work. In this machine problem, we illustrate the basic concepts by using a separate thread to clear away status messages after a fixed time has passed. You will need to synchronize your code in the main thread with this helper thread using a Posix mutex. You may want to read the class notes on Posix threads to help you understand these interactions. Later classes will assume knowledge of this material.
Software examples and test strategies: In addition to the five learning objectives for the assignment, this machine problem provides you with examples of software structure as well as testing strategies for software components.
When you design software interfaces, you should do so in a way that allows you to test components separately in this manner and thus to deal with bugs as soon as possible and in as small a body of code as possible. Individual function tests and walking through each function in a debugger are also worthwhile, but hard to justify in an academic setting.
The modex.c file is compiled by the Makefile to create a separate executable that returns your system to text mode. We also made use of a technique known as a memory fence to check some of the more error-prone activities in the file; read the code to understand what a memory fence is and what it does for you.
The Pieces Provided
You are given a working but not fully-functional copy of the source tree for an adventure game along with a skeletal kernel module for the Tux controller. The Tux controller boards are attached to each machine in the lab.
The table below explains the contents of the source files.
adventure.c Game logic, including the main event loop, helper thread, timing logic, display control logic (motion and scrolling), and a simple command interpreter (command execution code is in world.c).
assert.c Support for assertions and cleanups as design and debugging aids.
input.c Input control. Provides for initialization and shutdown of the input controller. The version pro-
vided to you supports keyboard control. You must write the support for the Tux controller. Can be
compiled stand-alone to test the input device.
modex.c Functions to support use of VGA mode X. Includes things like switching from text mode to mode
X and back, and clearing the screens. Provides a logical view window abstraction that is drawn in normal memory and copied into video memory using a double-buffering strategy. When the logical view is shifted, data still on the screen are retained, thus only those portions of the logical view that were not previously visible must be drawn. Finally, supports mapping from pixelized graphics in formats convenient to C into drawing a screen at a certain logical position. Relies on photo.c to provide vertical and horizontal lines from the photo images. Is also compiled stand-alone to create the tr text mode restoration program.
photo.c Support for reading photo and object image data and mapping them into VGA colors. Also draws vertical and horizontal lines into graphics buffers for use with scrolling.
text.c Text font data and conversion from ASCII strings to pixelized graphic images of text (you must write the latter).
world.c Populates game world by setting up virtual rooms, placing objects, and executing commands issued by the player.
We have also included two stripped binaries to illustrate your end goal. The adventure-demo program is a fully working version of the game that allows both keyboard and Tux controller input. The input-demo program is a stand-alone compilation of input.c, again allowing both forms of input. Finally, the mp2photo.c program can be used to transform 24-bit BMP files into photos for the game, and the mp2object.c program can be used for object images. Neither tool is needed for your assignment.
The tr Program
The make file provided to you builds both the adventure game and a text-mode-restoration program called tr. The latter program is provided to help you with debugging. One difficulty involved with debugging code that makes use of the video hardware is that the display may be left in an unusable (that is, non-text-mode) state when the program crashes, hangs, or hits a breakpoint. In order to force the display back into text mode for debugging purposes (or, if you are not running the program in a debugger, to regain control of your shell), you can run the tr program. Unless you are fairly confident in your ability to type without visual feedback, we recommend that you keep a second virtual console (CTRL-ALT-F1 through F6) logged in with the command to execute the text restoration program pre-typed, allowing you to simply switch into that console and press Enter. Using this program is substantially easier than rebooting your machine to put it back into text mode.
You should also look at the cleanup handlers provided by the assert module (assert.h and assert.c). These cleanup handlers provide support for fatal exceptions, putting the machine back into a usable state when your program crashes. However, there may be instances and problems not covered by the handlers, and GDB can stop the program before the handlers are invoked, leaving the machine in an unusable state until you restore text mode with tr.
Mode X and Graphic Images
Mode X is a 256-color graphics mode with 320×200 resolution. It was not supported by the standard graphics routines that came with the original VGAs, but was supported by the VGA hardware itself, and was quickly adopted as the stan- dard for video games at the time because of certain technical advantages over the documented 256-color mode (mode 13h, where the ‘h’ stands for hexadecimal). In particular, mode X supports multiple video pages, allowing a program to switch the display between two screens, drawing only to the screen not currently displayed, and thereby avoiding the annoying flicker effects associated with showing a partially-drawn image. This technique is called double-buffering.
Each pixel in mode X is represented by a one-byte color value. Although only 256 colors are available in mode X, the actual color space is 18-bit, with 6-bit depth for red, green, and blue saturation. A level of indirection is used to map one-byte pixel color values into this space. The table used in this mapping is called a palette, as it is analogous to a painter’s palette, which in theory holds an infinite range of colors, but can only hold a few at any one time. Palettes are often used to reduce the amount of memory necessary to describe an image, and are thus useful on embedded devices even today. For your final projects, some of you may want to play with graphic techniques such as palette color selection to best represent a more detailed image and dithering to map a more detailed image into a given palette.
The mapping between screen pixels and video memory in mode X is a little contorted, but is not as bad as some of the older modes. The screen is first separated into blocks four pixels wide and one pixel high. Each block corresponds to a single memory address from the processor’s point of view. You may at this point wonder how four bytes of data get crammed into a single memory address (one byte in a byte-addressable memory). The answer, of course, is that they don’t. For performance reasons, the VGA memory was 32-bit, and the interface between the processor and the VGA is used to determine how the processor’s 8-bit reads and writes are mapped into the VGA’s 32-bit words. For example, when the processor writes to a memory address, four bits of a specific VGA register are used to enable (1 bits) or disable (0 bits) writes to 8-bit groups of the corresponding word in VGA memory. When drawing rectangles of one color, this mask reduces the work for the processor by a factor of four. When drawing images, however, it poses a problem in that adjacent pixels must be written using different mask register settings. As changing a VGA register value is relatively slow, the right way to use mode X is to split the image data into four groups of interleaved pixels, set the mask for each group, and write each group as a whole using the x86 string instructions (you won’t need to write any of these, although you may want to look at how it is done in the existing code or in the x86 manual).
address 0xA1284 (0xA1234 + 80)
address 0xA12D4 (0xA1234 + 160)
2 3 0 1 2 3 0 1 2 3 0 1 2 3
(all at 0xA1236) plane 3
Mode X video memory is memory-mapped and runs from (physical) memory address 0xA0000 to 0xAFFFF, for a total of 216 addresses and 256 kB of addressable memory (counting all four planes). The addresses used in your program are virtual rather than physical addresses, a concept to be discussed later in the course; don’t be surprised if they are not the same as the physical addresses, though. A full screen occupies 320 × 200/4 = 16, 000 addresses, so four fit into the full memory, with a little extra room. The VGA can be directed to start the display from any address in its memory; addresses wrap around if necessary. In the figure shown above, for example, the screen starts at address 0xA1234.
Due to the timing behavior of emulated interactions with video memory, the code provided to you does not use a traditional double-buffering model in which the off-screen image is drawn directly within video memory. As with double-buffering, we do maintain two regions within the video memory for screen images. However, we use regular memory to maintain an image of the screen and to update that image in place. When a new image is ready for display, we copy it into one of two regions of the video memory (the one not being displayed) and then change the VGA registers to display the new image. Copying an entire screen into video memory using one x86 string instruction seems to take about as long as writing a small number of bytes to video memory under QEMU, thus our image display is actually faster than trying to draw a vertical line in video memory, which requires one MOV instruction per vertical pixel.
Only the modex.c file relies on mode X. The rest of the code should use a graphics format more convenient to C. In particular, an image that is X pixels wide and Y pixels high should be placed in an array of dimensions [Y ][X]. The type of the array depends on the color depth of the image, and in our case is an unsigned char to store the 8-bit color index for a pixel. When you write code to generate graphic images from text, as described in a later section, you should use the same format.
Graphical Mapping in the Game
The game shows a two-dimensional photo for each virtual room in the simulated world, and the screen at any time shows a region of the current photo. The photo is fully resident in memory, but could in theory be constructed dynamically as necessary to fill the screen. This facet is useful for games in which drawing the entire virtual world at once requires too much memory.
plane 3 of logical view
plane 2 of logical view
plane 1 of logical view
plane 0 of logical view
mapped into build buffer using photo coordinates for mode X planes
planes shift in the build buffer as the logical view moves within the room photo
room photo pixels
size of scrolling portion of video screen
logical view window
We use a build buffer to keep the pixel data relevant to the screen organized in a form convenient for moving into video memory in mode X. However, in order to avoid having to move the data around a lot in the build buffer (or redraw the whole screen each time we move the logical view window by one pixel), we break the room photo into 4×1 pixel chunks using the mode X mapping illustrated in the previous section. The address of the logical view window in the room photo is used to decide where to place the image planes within the build buffer, and moves around within the build buffer as the logical window moves in the room photo, as shown in the figure below.
this address is in the window for planes 1, 2, and 3
this address is in the window for plane 0
logical view window
The mapping that we have described has a subtle detail: the address range used for the different planes within the logical view window may not be the same. Consider the case shown above, in which show x & 3 == 1. As we move the logical view window around in the room photo, we need to keep each address block at a fixed plane in the build buffer (again to avoid having to copy data around). If we were to keep the planes in the order 0 to 3 and not put any extra space between them, the image of plane 0 would in this case overlap with the image of plane 1 in the build buffer. By reversing the order, we can avoid this problem (alternatively, we could have used a one-byte buffer zone between consecutive planes).
The next problem is mapping the build buffer into the video memory. We use two buffers in video memory and map into the non-displayed buffer, then change the VGA register to show the new screen. You can read about this double- buffering technique in the code and see how it works. The complexity with regard to the plane mapping is that we must map the build buffer planes, which are defined in terms of the room photo coordinates, into the video planes, which are defined in terms of the screen coordinates. The picture below illustrates this problem. In general, a cyclic shift of the planes suffices for the mapping.
logical view window
video screen memory
build buffer layout
build video buffer memory
00 11 22 33
(a cyclic shift of planes)
The next question is the size of the build buffer. If we can limit the size of the room photo, we can allocate a build buffer large enough to hold any logical view window. If the window starts at pixel (0,0) in the room photo, plane 3 is placed at the beginning of the build buffer. If the window occupies the lower right corner of the room photo, plane 0 is placed at the end of the build buffer. Such calculations are not particularly hard.
However, we do not want to restrict the size of the room photo, so we add a level of indirection and move the relative offset of the logical view window within the build buffer as the logical view shifts within the room photo. The room photo can thus be of almost arbitrary size, and is limited only by the size of the coordinate types (32-bit indices).
The img3 and img3 off variables provide the additional level of indirection. At any point in time, adding the address calculated from the logical view window coordinates (show x,show y) to the img3 pointer produces a pointer to the start of plane 3 in the build buffer. However, the actual position of that plane in the build buffer may change over time. Whenever a new logical view window setting is requested, the code checks whether all four planes of the new window