| The Anatomy & Physiology of Memtest86-SMP |
| ----------------------------------------- |
| |
| 1. Binary layout |
| |
| --------------------------------------------------------------- |
| | bootsect.o | setup.o | head.o memtest_shared | |
| --------------------------------------------------------------- |
| Labels _start<-------memtest---------->_end |
| ----------------------------------------------------------- |
| addr 0 512 512+4*512 | |
| ----------------------------------------------------------- |
| |
| 2. The following steps occur after we power on. |
| a. The bootsect.o code gets loaded at 0x7c00 |
| and copies |
| i. itself to 0x90000 |
| ii. setup.o to 0x90200 |
| iii. everything between _start and _end i.e memtest |
| to 0x10000 |
| b. jumps somewhere into the copied bootsect.o code at 0x90000 |
| ,does some trivial stuff and jumps to setup.o |
| c. setup.o puts the processor in protected mode, with a basic |
| gdt and idt and does a long jump to the start of the |
| memtest code (startup_32, see 4 below). The code and data |
| segment base address are all set to 0x0. So a linear |
| address range and no paging is enabled. |
| d. From now on we no longer required the bootsect.o and setup.o |
| code. |
| 3. The code in memtest is compiled as position independent |
| code. Which implies that the code can be moved dynamically in |
| the address space and can still work. Since we are now in head.o, |
| which is compiled with PIC , we no longer should use absolute |
| addresses references while accessing functions or globals. |
| All symbols are stored in a table called Global Offset Table(GOT) |
| and %ebx is set to point to the base of that table. So to get/set |
| the value of a symbol we need to read (%ebx + symbolOffsetIntoGOT) to |
| get the symbol value. For eg. if foo is global varible the assembly |
| code to store %eax value into foo will be changed from |
| mov %eax, foo |
| to |
| mov %eax, foo@GOTOFF(%ebx) |
| 4. (startup_32) The first step done in head.o is to change |
| the gdtr and idtr register values to point to the final(!) |
| gdt and ldt tables in head.o, since we can no longer use the |
| gdt and ldt tables in setup.o, and call the dynamic linker |
| stub in memtest_shared (see call _dl_start in head.S). This |
| dynamic linker stub relocates all the code in memtest w.r.t |
| the new base location i.e 0x1000. Finally we call the test_start() |
| 'C' routine. |
| 5. The test_start() C routine is the main routine which lets the BSP |
| bring up the APs from their halt state, relocate the code |
| (if necessary) to new address, move the APs to the newly |
| relocated address and execute the tests. The BSP is the |
| master which controls the execution of the APs, and mostly |
| it is the one which manupulates the global variables. |
| i. we change the stack to a private per cpu stack. |
| (this step happens every time we move to a new location) |
| ii. We kick start the APs in the system by |
| a. Putting a temporary real mode code |
| (_ap_trampoline_start - _ap_trampoline_protmode) |
| at 0x9000, which puts the AP in protected mode and jumps |
| to _ap_trampoline_protmode in head.o. The code in |
| _ap_trampoline_protmode calls start_32 in head.o which |
| reinitialises the AP's gdt and idt to point to the |
| final(!) gdt and idt. (see step 4 above) |
| b. Since the APs also traverse through the same initialisation |
| code(startup_32 in head.o), the APs also call test_start(). |
| The APs just spin wait (see AP_SpinWaitStart) till the |
| are instructed by the BSP to jump to a new location, |
| which can either be a test execution or spin wait at a |
| new location. |
| iii. The base address at which memtest tries to execute as far |
| as possible is 0x2000. This is the lowest possible address |
| memtest can put itself at. So the next step is to |
| move to 0x2000, which it cannot directly, since copying |
| code to 0x2000 will override the existing code at 0x1000. |
| 0x2000 +sizeof(memtest) will usually be greater than 0x1000. |
| so we temporarily relocated to 0x200000 and then relocate |
| back to 0x2000. Every time the BSP relocates the code to the |
| new location, it pulls up the APs spin waiting at the old |
| location to spin wait at the corresponding relocated |
| spin wait location, by making them jump to the new |
| statup_32 relocated location(see 4 above). |
| Hence forth during the tests 0x200000 is the only place |
| we relocate to if we need to test a memory window |
| (see v. below to get a description of what a window is) |
| which includes address range 0x2000. |
| |
| Address map during normal execution. |
| -------------------------------------------------------------------- |
| | head.o memtest_shared | |RAM_END |
| -------------------------------------------------------------------- |
| Labels _start<-------memtest---------->_end |
| -------------------------------------------------------------------- |
| addr 0x0 0x2000 | Memory that is being tested.. |RAM_END |
| -------------------------------------------------------------------- |
| |
| Address map during relocated state. |
| -------------------------------------------------------------------- |
| | head.o memtest_shared | |RAM_END |
| -------------------------------------------------------------------- |
| Labels _start<-------memtest---------->_end |
| -------------------------------------------------------------------- |
| addr memory that is being tested... |0x200000 | |RAM_END |
| -------------------------------------------------------------------- |
| |
| iv. Once we are at 0x2000 we initialise the system, and |
| determine the memory map ,usually via the bios e820 map. |
| The sorted, and non-overlapping RAM page ranges are |
| placed in v->pmap[] array. This array is the reference |
| of the RAM memory map on the system. |
| v. The memory range(in page numbers) which the |
| memtest86 can test is partitioned into windows. |
| the current version of memtest86-smp has the capability |
| to test the memory from 0x0 - 0xFFFFFFFFF (max address |
| when pae mode is enabled). |
| We then compute the linear memory address ranges(called |
| segments) for the window we are currently about to |
| test. The windows are |
| a. 0 - 640K |
| b. (0x2000 + (_end - _start)) - 4G (since the code is at 0x2000). |
| c. >4G to test pae address range, each window with size |
| of 0x80000(2G), large enough to be mapped in one page directory |
| entry. So a window size of 0x80000 means we can map 1024 page |
| table entries, with page size of 2M(pae mode), with one |
| page directory entry. Something similar to kseg entry |
| in linux. The upper bound page number is 0x1000000 which |
| corresponds to linear address 0xFFFFFFFFF + 1 which uses |
| all the 36 address bits. |
| Each window is compared against the sorted & non-overlapping |
| e820 map which we have stored in v->pmap[] array, since all |
| memory in the selected window address range may correspond to |
| RAM or can be usable. A list of segments within the window is |
| created , which contain the usable portions of the window. |
| This is stored in v->mmap[] array. |
| vi. Once the v->mmap[] array populated, we have the list of |
| non-overlapping segments in the current window which are the |
| final address ranges that can be tested. The BSP executes the |
| test first and lets each AP execute the test one by one. Once |
| all the APs finish execting the same test, the BSP moves to the |
| next window follows the same procedure till all the windows |
| are done. Once all the windows are done, the BSP moves to the |
| next test. Before executing in any window the BSP checks if |
| the window overlaps with the code/data of memtest86, if so |
| tries to relocate to 0x200000. If the window includes both |
| 0x2000 as well as 0x200000 the BSP skips that window. |
| Looking at the window values the only time the memtest |
| relocates is when testing the 0 - 640K window. |
| |
| Known Issues: |
| * Memtest86-smp does not work on IBM-NUMA machines, x440 and friends. |
| |
| email comments to: |
| Kalyan Rajasekharuni<kc_rajasekharuni@yahoo.com> |
| Sub: Memtest86-SMP |