See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/247924813
Converting a swap-based system to do paging in an architecture lacking page-
Conference Paper in ACM SIGOPS Operating Systems Review · December 1981 DOI: 10.1145/1067627.806595
Copyright By cscodehelp代写 加微信 cscodehelp
2 authors, including:
University of Bologna
131 PUBLICATIONS 4,737 CITATIONS SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Sustainable Exascale Computing View project
All content following this page was uploaded by on 24 October 2017. The user has requested enhancement of the downloaded file.
Converting a Swap-Based System to do Paging in an Architecture Lacking Page-Referenced Bits
~zalp , Science Division
Department of Electrical Engineering and Computer Sciences University of California, Berkeley
Berkeley, California 94720
made to the UNIX operating system for the VAX- 11/780 to convert it from a swap-based segmented system to a paging-based virtual memory system. Of particular interest is that the host machine archi- tecture does not include page-referenced bits. We discuss considerations in the design of page- replacement and load-control policies for such an architecture, and outline current work in modeling
the policies employed by the system. We describe our experience with the chosen algorithms b a s e d on benchmark-driven studies and production system use.
In the fall of 1978 the Cemputer Science Divi- sion of the University, of California at Berkeley purchased a VAX-11/78~, and arranged to run an early version of UNIX for the VAX provided by Bell Laboratories under a cooperative research agreement. The VAX was purchased because it is a 32-bit machine with a large address space, and we had hopes of running UNIXD which was being used on other smaller machines.
This material is based upon work partially sup- ported by the National Science Foundation under Grants No. MCS 7807291, MCS 7824618, MCS 7407644-A03, Defense Advanced Research Projects Agency (DoD) ARPA Order No. 4031, Monitored by the Naval Electronics Systems Command under Con- tract No. N00039-80-K-0649 and by an IBM Graduate Fellowship to the second author.
Autborls present address: Department of Comput- er Science. Coruell University. Ithaca. 14853.
* VAX and VMS are Trademarks of the Digital Equipment Corporation.
** UNIX is a Trademark of Bell Laboratories.
Permission to copy without fee all or part of this material is granted provided that the copies arc not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
This p a p e r
discusses the modifications
Except for the machine-dependent sections of code, UNIX for the VAX was quite similar to that for the PDP-II which has a 16-bit address space and no paging hardware. It made no use of the memory-
management hardware available on the VAX aside from simulating the PDP-II segment registers with VAX page table entries. The main-memory management schemes employed by this first version of the sys- tem were identical to their PDP-II counterparts– processes were allocated contiguous blocks of real memory on a first-fit basis and were swapped in their entirety. A subsequent version of the system
was capable of loading processes into noncontiguous real memory locations, called scatter ~ , and was able to swap only portions of a process, called partial swapping, as deemed necessary by the memory contention. This would become the basis for the paging system development discussed in this paper.
The user-friendliness and portability of the UNIX environment were perceived to be large advan- tages by our installation. However, application progrems outgrew our resources with the original swap-based system very quickly. The initial confi- guration of the machine had only i/2 megabyte of real memory. Although, in the long run more memory would be available, it seemed natural to incor- porate paging into UNIX and thereby support larger applications and make better use of our limited main memory. This would also provide a vehicle for researching storage hierarchy performance evalua- tion and memory-management techniques.
Search for a replmceatent policy
The VAX memory-management architecture sup- ports paging within three segments (two for user processes, one for the system). The interesting aspect of the architecture is the lack of page- referenced bits (also called use bits). Such bits typically provide the reference information which commonly-implemented page replacement algorithms such as ~Wclock1′ and Sampled Working Set (SWS) [DENN 68a] (to be described in the following sec-
@ 1981ACM 0-89791-06g-1-12/81-0078
tions) base their decisions on. Without even this minimal page reference information, the only rea- sonable algorithms for replacing pages are the First-In-First-Out (FIFO) and the Random (RAND) policies, which are known to have performances (as measured by the number of page faults generated for a given mean memory occupancy) inferior to the clock and SWS policies [BELA 66. KING 71]. To
remedy this situation, the dynamic address transla- tion mechanism of the VAX was used to detect and record references to pages. With this scheme, a page for which reference information is to be gath- e r e d is marked as invalid although it remains in
main memory. This state for a page is called the state. A reference generated to a location within this page causes an address- tr~nslatlon-not-valld fault. However. the fault handler can detect this special state of the page and thus refrains from initiating the page transfer from secondary memory. In other words, the reclaimable state for a page corresponds to a valld page with the reference bit off if the reference bit were available. Since this method of simulat- ing page-referenced bits through software has a nonnegligible cost associated with it, the relative
performance of some of the well-known replacement algorithms in this enviro~ent is no longer obvi- ous.
In VMS, the vendor-supplied operating system for the VAX, the solution to the replacement deci- sion is simple. Each process is assigned a fixed- size memory partitionj called a resident setq that is managed according to the FIFO policy. Pages that are not members of any of these resident sets are grouped together to constitute the global free list which functions as a disk cache. Although there is some isolation between the paging behavior of the various processes due to the strictly local resident sets, the coupling that is introduced through this global free list has significant per- formance implications. Lazowska [LAZO 79] reports that in his measurements based on a real workload, system performance was significantly improved by increasing the minimum size of the free list (a system generation paremeter). An unfortunate consequence of allocating fixed-size partitions to processes is that a process has its pages taken away from its resident set (relatively small in size compared to the total real memory available on the machine) and placed in the free list to be sub- sequently reclaimed even though it may be the only active process in the system.
Babaoglu has studied a class of hvbrid replacement policies that employ different algo- rithms for page replacement emongst two logical ~artitions of pages in main memory [BABA 80], This class includes the VMS algorithm described above as an instance where the resident set management is according to the FIFO policy and the free list management is approximately Least-Recently-Used (LRU). [BABA 80] shows that for a given progrem and a given amount of available memory, there exists a resident set size for which the FIFO-LRU hybrid policy achieves a fault rate close to that of the pure LRU policy while incurring a cost com- parable to that of the FIFO policy.
UNIX is particularly ill-suited for such a scheme for several reasons. The UNIX system
encourages the creation of a number of processes to accomplish most tasks– processes are cheap. These processes are nonhomogeneous; they vary greatly in size and in the manner in which they access their address space. Furthermore, in certain processes the page reference behavior varies radically over time as the process enters different phases of exe- cution. The LISP system, which initiates garbage collection after an interval of execution, is an exemple of such a process. Thus, in this environ- ment, it is unlikely that we will find a single system-wide value for the fixed resident set size that will nearly optimize a cost function that is the weighted sum of the page fault rate and the rate at which reclaimable pages are referenced for the hybrid policy. In fact, even for a single pro- cess, the value of the resident set size must vary in time in order to track different phases of its execution and the varying amounts of real memory available to it. As described earlier, the total number of pages from the free list belonging t o a certain process is a dynemic quantity due to its sensitivity to the system-wide paging activity. A
more recent version of the VMS operating system (version 2.1) attempts to remedy some of these problems by adjusting the process resident set size
within two fixed boundaries according to a heuris- tic based on global paging rates [DEC 80]. Due to its unavailability at the time, this modified ver-
of the system was not included in our studies.
Simulation studies based on actual progrem address traces showed the clock page replacement algorithm [CORB 68] to be much more robust with respect to the cost function defined above to vari- ations in the “mount of memory available to the program, the relative costs of page faults and reclaims, and the nature of the progrem itself than the fixed-partition VMS scheme [BABA 81a]. Under the simplest form of this policy, all the pages allocated to a program are thought of as ordered around the circuzference of a circle, called the loo9, according to their physical page frame number. In addition, there is pointer, called the hand, that is advanced circularly through them when page faults occur until a replacement candidate is located. A page is chosen for replacement if it has not been referenced during the time interval between two successive passages of the hand through this page. Empirically, the clock page replacement policy achieves fault rates that are very close to those of the LRU policy although it is much easier to implement [GRIT 75], On machines with reference bits, it suffices to examine reference bits associ- ated with pages as the hand passes over them. If a page has the reference bit clear when the hand passes, it has not been referenced for one revolu- tion and thus it is selected for replacement. If a page has been referenced, then the reference bit is cleared and the page remains in the loop for at least another revolution. This exemining of the reference bit along with the associated action is called the scan operation. For our environment, this algorithm can remain unchanged since setting the reference bit associated with a page corresponds to moving it from the reclaimable to the valid state whereas resetting its reference bit corresponds to moving it from the valid to the reclaimable state.
Another major departure in our UNIX memory management from the VMS design resulted from our decision to apply the clock page replacement algo- rithm globally to all pages in the system rather than locally to the pages for each process. This results in a variable-size memory partition for each process. This was motivated by studies where global versions of fixed-partition replacement pol- icies had been found to have better performances than their local counterparts [OLIV 74, SMIT 80, SMIT 81], and some special properties of our enviroement,
(i) The relative simplicity of the global clock policy and, consequently, the ease of imple-
(ii) The projected workload for the system had no requirement of guaranteed response times as in real-time applications.
(iii) It was unreasonable to expect users to specify the sizes of the fixed program parti- tions since from the existing system they had little or no information about the memory requirements of programs.
(iv) Without reference bits, the cost of imple- menting variable-partition local replacement policies such as SWS and Page Fault Frequency
[CHU 76] was too high. We further comment on this in the following section.
(v) UNIX encourages the construction of tasks consisting of two or more processes communi- cating through pipes, which must he co- scheduled if they are to execute efficiently. In most instances, the activity intensity, thus the memory demand, shifts over time from the left-most p r o c e s s to the right-most pro- cess in the pipe while all of them remain active. It was our belief that in such an environment, dynamic partitioning of memory amongst these processes in real time is more appropriate than having local partitions (working sets) that are maintained in process virtual time.
Hemory demand and clock triggering
The clock page replacement policy is only engaged upon a page fault, at which time it selects
a page to be replaced. Given that the demand for memory exhibits nonuniform patterns with occasional high spikes (see Figure I), this strategy for the activation of the replacement policy is clearly suboptimal.
Having incurred the cost of page replacement policy activation, we would like to select more than a single page to be replaced in order to anti- cipate short-term demand for more memory. To this end, the system maintains a free page pool contain- ing all of the page frames that are currently not in the loop. Our version of the clock policy is triggered whenever the size of this pool drops below a threshold. Then, the algorithm scans a given number of pages per second of real time Ca simplified version of this algorithm is discused in [EAST 793). Currently, the default trigger point for the free page pool size is set at i/4 of the real memory size and the default minimum scan rate of the hand is approximately I00 pages per second. As the free page pool size further drops below the threshold, the scan rate of the hand is increased linearly up to a given maximum value. The primary
factor that determines this maximum value is the time that it takes to service a page reclaim from the loop (i.e., the time to simulate the setting of a reference bit). Measurements based on the current system indicate that this action consumes approximately 250 microseconds of processor time. Since the number of pages scanned by the clock algorithm provides an upper bound on the number of pages that can be reclaimed, the processor overhead due to the simulation of reference bits can be con- trolled by limiting this maximum scan rate. Currently, we allocate at most I0 percent of the available processor cycles to this function which implies that the maximum scan rate of the hand is limited to approximately 300 pages per second. Due to the existence of the free page pool, however, short duration memory demands far in excess of this value can be satisfied.
The system maintains enough data to be able to reclaim any page from the free page pool regardless of how it arrived there. In addition to being replenished from the loop, the free page pool also receives pages of processes that are swapped out or completed. In both cases, these pages can be reclaimed by the process upon a subsequent swap in or a future incarnation of the same code. provided of course that the pages have not been allocated for another purpose.
500 i000 1500 Real time (seconds)
]riSare 1. Number of page frames requested globally in one second intervals during a 33 minute observation period.
Given the cost to simulate the setting of a reference bit, our previous remark concerning the unsuitability of local variable partition page replacement policies in the UNIX enviror,nent is justified. As an example, using the Sampled Work- ing Set policy with a window size of I00,000 instructions (approximately I00 milliseconds on the VAX) operating with a program having a 400-page working set would consame i00 percent of the pro- cessor cycles just to simulate reference bits
(assuming that the working set of the program remained unchanged between two consecutive sample points).
The use of a modified clock page replacement algorithm where the scan rate is based on the available memory has several other advantages, as well. The length of the free page pool becomes a natural indicator of the amount of memory conten- tion in the system. As we shall see, the inability of the system to maintain some specified amount of free memory is the basis for load control, and causes process deactivation by swapping. Control of the rate of the scan allows modified page write-back activity, that is initiated when dirty pages are removed from the clock loop to be spread more uniformly over time, thereby easing contention for disk.
We found that a vast majority (over 80 per- cent) of all forks executed in the system were due to the command interpreter. Since these forks only serve to establish the context for the new process, duplication of the entire address space was wasted effort. Most of the sharp spikes in the global
memory demand pattern of Figure 1 could be attri- buted to processes forking and/or execing. The nondemand nature of these requests for memory (in the sense that they are an implementation artifact) overtaxed the page replacement algorithm and had grave performance consequences.
A natural solution to the problem would have been to include a “copy-on-write” facility to implement a fork similar to that used in various PDP-10 operating systems (such as TENEX [BOBR 72]). In this scheme, the two processes would be allowed to share the same address space and the copying at the page level would be deferred until the time of t h e first modification of a page by either process. However, this would have significantly increased the number of modifications to UNIX and hence delayed the completion of a workable and useful system. At the time, the desires of our user com- munity did not indicate that shared-memory primi- tives would be necessary in the near future, Copy-on-write paging seemed to introduce a good deal of complexity into the relatively simple sys- ten data structures to warrant support for the very small emount of computation which occurs between a fork and an exec system call.
new system facilities
The UNIX system memory-management facilities are particularly simple. Each user process has a read-only shared program area, a modifiable data area, and a stack. An exec system call overlays a process’ address space with a particular program image from a file consisting of the shared code and the initialized data. New processes are created by the fork system call which causes a process to duplicate itself. Usually, the command interpreter accomplishes its task by first creating a copy of itself to establish the context for the command and then causes this copy overlay itself with the file that is the image of the command. Except for shared progrem areas, no shared memory between processes is available. Access to files and dev- ices is through read and write system calls; no segment-based or page-based shared access to file pages is available.
Consistent with our design goals, we wished to keep changes to the system as simple as possible and orthogonal to the rest of the system design. Then, further changes in the UNIX system would not invalidate our efforts.
程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: email@example.com