In this post, I am going to talk about logical and physical addressing of main memory (DRAM) in computer systems.
Logical addresses are also known as virtual addresses. In fact, I would refer to them logical aka virtual aka unreal addresses. Yes, as the name suggests, logical addresses are unreal addresses. These are not the real physical memory (DRAM) addresses. These are typically the addresses seen by the programmer. Any operation that works with addresses — pointer manipulation, passing around references to data and functions, printing references for debugging and diagnostics, all refer to logical/virtual addresses. Let’s run the following simple program
int main(int argc, char *argv[]) { int a = 20; int *ptr = (int *)malloc(sizeof(int)); printf("Address of local variable a => %p \n" (void *)(&a)); printf("Contents of ptr => %p \n", ptr); free(ptr); }
The above program :
- Allocates a local int variable and initializes it to value 20.
- Does a simple dynamic memory allocation for 4 bytes of memory (assuming the platform you are running on implements int as 4 bytes).
- The call to malloc() returns a pointer to the starting address of the 4 byte (at least) memory region that we can manipulate, read from, write to using the pointer variable “ptr”.
- We then print the address of variable “a”, and the address contained in pointer variable “ptr”.
Here is the output from multiple runs of the program:
Address of local variable a => 0x7fff19771394
Contents of ptr => 0x1a49010
Address of local variable a => 0x7fffca040ec4
Contents of ptr => 0xc4f010
Address of local variable a => 0x7fffb3edb0e4
Contents of ptr => 0x25ac010
Note: I compiled and ran the program on x86-64 Linux machine with “gcc” compiler.
What type of addresses are these ?
These are logical/virtual addresses. The addresses do not really tell us the precise location where integer value 20 is stored in main memory. Similarly, we don’t know the actual starting location of the memory region we malloc’d. Our code will make references to these memory locations using the given virtual address, and that’s all we care about as a programmer. The CPU always generates a logical address for every memory reference it needs to make on main memory (DRAM). Before the reference is made, the logical or virtual address is translated into the actual physical address for the desired location in main memory. This translation is done by the Memory Management Unit (MMU), a capability of hardware in our computer system.
An important question. Why is there such a separation between logical and physical addresses ?
- Allowing the users/programmers to directly manipulate physical memory can be very dangerous for the system. The operating system and hardware lose the freedom and flexibility to manage memory with their internal algorithms, privileged instructions etc.
- The system really can’t provide guaranteed memory protection (aka address space protection) among multiple processes residing in main memory. Multiple programs might be running on my system, and if the programmers directly manipulate the physical memory addresses, they can accidentally or intentionally corrupt the memory used by other user programs.
- Data structures, and memory used by OS kernel algorithms may get corrupted.
- It is impossible to provide any sort of abstraction (virtual memory) over main memory. Don’t worry about virtual memory for now. I am going to cover it in detail in the next post — already in the pipeline :). Just think of it as an abstraction. When the users are really working with the reality (the actual physical memory addresses), any user level abstraction provided by the operating system or hardware or both will probably be rendered useless.
- Compiler really doesn’t know the exact location of where exactly the program will be loaded into memory. So, it can’t generate absolute addresses for symbols in the program, as these addresses are not fixed at compile-time. Instead, the compiler generates offset based addresses aka relocatable addresses which are then resolved to actual physical addresses during execution (execution time address binding) when the memory references are actually made.
- If the compiler is generating absolute or actual physical addresses, then it is almost impossible to have multiple program binaries in main memory at the same time.
If user A and user B dump the addresses (refer to above sample program) of various symbols in the program, they may even see the same addresses. But, this doesn’t mean that those addresses are referring to identical physical memory locations. They shouldn’t and they don’t (barring shared memory for now).
Every process has its own region of space in main memory, and is known as the physical address space of the process. Basically, its the set of all the physical memory region (may or may not be contiguous) used by the process. So, even if the logical/virtual addresses turned out to be identical, its the job of MMU to translate these into corresponding physical memory addresses, which of course will be different unless some physical memory region is shared between the processes.
On a 32 bit system, the logical addresses generated by the CPU are 32 bits. So, we can effectively access (2 ^32) bytes (4 GB) of physical memory (DRAM) on 32 bit systems. As a side-note, you can read upon PAE (Physical Address Extension) scheme adopted by Intel x86 architecture that allows addressing more than 4 GB of physical memory on 32 bit systems. I won’t be discussing about PAE in this post.
Coming back to our discussion. Now that we know a bit about the logical and physical addresses, let’s talk about how the address translation is done by MMU.
The exact manner it’s done is very much dependent on how memory space in DRAM is allocated to a process by the Operating System.
Contiguous Memory Allocation – In this scheme, the process occupies a contiguous space in DRAM — starting at some physical address S, and occupying all the region up to some physical address E. The physical address space of the process is one big chunk of memory block from DRAM allocated to the process. The Memory Management Unit can be as simple as a combination of 2 registers — Base Register aka Relocation Register and Limit Register. Base Register contains the starting physical address of the process in DRAM, and Limit Register specifies the limit (relative to the base address) on the region occupied by the process in DRAM. Limit Register actually defines the range of valid logical addresses.
When a process in DRAM is scheduled for CPU execution, it is the job of Operating System (specifically the dispatcher) to load these 2 registers with appropriate values saved in the Process Control Block (PCB)
Every logical address generated by the CPU is first checked against the value in limit register. If the logical address is less than the value in limit register, the base address (value in Base Register ) is added to the logical address to get the corresponding physical address of the memory location to be referenced. If the value is more, it is an invalid memory access. In this case, CPU traps to the operating system, and the OS terminates the program along with a process core dump indicating the fatal error.
The below diagram borrowed from Operating Systems Concepts (Silberschatz et al) demonstrates this address translation.
Non-Contiguous Memory Allocation – In this strategy, a process can be allocated memory wherever it is available. Separate Data Structures and hardware support is required to do the mapping between virtual and physical addresses. The technique is called Paging, and I will cover it in the next post in full detail along with Virtual Memory.
Hi,
It’s a nice article. I’ve one question and hope you’ve some time to answer it.
Logical (or relative or relocatable) addresses are usually generated by the compiler right? So, why we say sth like “Every logical address generated by the CPU”?
LikeLike
Thanks for the comment Zeck. Sorry for the delayed response.
You are right. Logical addresses are generated by compiler. My statement didn’t mean to imply something different. What I was trying to say is the following “every time CPU makes a memory reference with a particular logical address”. I hope this clears the confusion.
LikeLike