Abstract
This article outlines an implementation of a Virtual Memory management algorithm I did on PDP-11 in mid 1980s.
Project
This was at my job at VIR. We were using an Elektronika 100-25 minicomputer, which was a Soviet clone of PDP-11 from DEC (Digital Equipment). It should really be spelled as "mini"computer, because it occupied an entire large room. DEC developed a nice database management system called Datatrieve-11, which was powerful and reliable, but unfortunately had two flaws: it did not support nice report generation at the time (and we needed lots of nice reports) and it was not fast, to put it mildly. In fact, it was so slow that we never succeeded at using it for anything practical.
My assignment
I was supposed to write something more efficient than Datatrieve-11. And I did. What I wrote was a complete scripting environment for database management, UI and report generation. It produced nice reports and it was fast. The project took about a year of programming in Macro-11, the DEC assembly language. I called it DBCalc. Too bad this was done in the USSR before the end of communist rule. If I had written that system in America in 1984, I could've made some dough.
Challenge
DBCalc compiled the source code from my DBCalc scripting language (entirely in Russian, keywords and all) to an internal bytecode, which was then interpreted. The bytecode format was loosely based on that used by the Fortran IV compiler/interpreter (yes, Fortran IV on PDP-11 was an interpretive language).
Troubles began as applications written in DBCalc started growing out of memory. There was only 64 kB for everything: the interpreter, the runtime, the compiled application code, the call stack and the data. All of this together just did not fit in 64 kB.
Solution
The solution was virtual memory. There was VM support provided by the Fortran IV runtime, but the Fortran runtime itself would take up tremendous amounts of memory, which, as I explained, I did not have. So I implemented my own virtual memory.
- I needed to be able to load any function anywhere in the address space. The bytecode format was already fully relocatable, so that part did not have to change.
- The compiler now translated the source into a library file, which started with a Table-of-Contents (TOC), each entry of which was the absolute address and length of the corresponding function in the library file itself.
- I changed the CALL bytecode command. Before the change, it took the address of the function as an argument. After the change, it took the function's TOC index instead of address.
- When the application started, it first loaded the TOC into memory and allocated an array of addresses of the same length. That array, Address_Array, would contain in-memory addresses of individual functions. It was initialized with NULLs. NULL meant that the function was not currently in memory.
- The bytecode of each function had a header field called LRU_Index, which was used to facilitate the LRU ordering of functions. The most recently called function would have the highest value of the LRU_Index.
- In addition I had a global integer called Next_LRU_Index. It was initialized with 0.
- Free memory fragments were connected in a doubly-linked list.
Here's how all of this worked:
When a function was called (or returned to from a nested call), we would first see if it was in memory by checking its cell in the Address_Array.
If it was in fact in memory, we would increment the Next_LRU_Index variable (watching for integer overflow) and put its value in the function's LRU_Index.
If the required function was not in memory, we would first check if there was enough memory to load it. If so, we would load it and proceed with the invocation as described above. Otherwise, we needed to kick some other function out of memory. We would iterate through all currently loaded functions and find the one with the lowest LRU_Index. I experimented with a double-linked list, but concluded that a linear search was on balance more efficient. We would change the function's address in Address_Array to NULL, free the memory and try again. If we still did not have enough memory, we'd kick out another function, etc. When there were no more functions to swap out, we would try to compact the entire memory. If that did not help, we were officially out of memory.
Of course, this was poor man's virtual memory, but it was simple and compact and did the job. DBCalc was still a lot faster then Datatrieve.
Tangent
The russified version of Datatrieve-11 was called ФОБРИН, abbreviated as ФБР, which is also the Russian acronym for FBI. Consequently, there were jokes abound about the FBI's involvement in the Soviet database management. This predated the internet, our computer was not connected to any network, so we weren't thinking espionage. We were mostly thinking sabotage. "Do you think ФБР can literally wear down the disks to the aluminum substrate?"
The FBI had nothing to do with our research institute, but the KGB certainly did. They killed the incredible scientist and founder of VIR, Nikolai Vavilov. Then they kept observing all of the Institute's scientific activities from their own office in the main building. The sign on the door said "Department № 1". By tradition, the KGB was always Department 1, in every organization, so colloquially instead of saying "I talked to the KGB", you would say "I visited the First Department (первый отдел)". Their official job description was to ensure proper handling of secret information. I had the displeasure of spending two hours in that office when I decided to quit my job in 1990. All I actually needed was a signature on my check-out list. Rather than merely ascertaining that I did not have possession of any secret documents, the KGB guy sat me down and wanted to know everything about me: the details of the job I was leaving, of my new job, my family history etc. I hope he enjoyed the interrogation, 'cause I certainly did not.