15 Feb 2004

FreeBSD/DragonFlyBSD VM bug

I have sent a headsup to bugs list of DragonflyBSD, which mentions the recent bugfixes on FreeBSD-CURRENT committed by Dr. Alan Cox.

The two commits are here and here. The first commit is more important than the second, and the two commits, as a whole, corrects a long-standing race condition in inactive queue scan and vm_page_try_to_cache().

Matthew Dillon soon replied me with a confirm of the bug, and to my surprise he has replied me the second time after only 20 minutes (wow), with a deeper thought on the issue. The main idea, in my opinion, is that in a SMP setup there exists a more serious race condition which is hard to fix.

Alan then points out a page about Mach’s VM algorithm (FreeBSD’s VM is heavily based on it). I quote it here:

“CMU Mach

initiator locks the page table,
queues a shootdown request on each “responder”, and
sends them remote interrupts

each responder acknowledges the interrupt by removing itself from the list of active processors, and
spins on the page table lock, waiting for the update to be completed

initiator makes changes and
unlocks the page table, terminating spins
responders flush appropriate TLB entries and
proceed

(special care is needed to avoid deadlocks and to lock out interrupts at appropriate times)”

The race condition is very rare so Alan plan to fix the problem on -CURRENT after he completes the pmap locking.

delphij's Chaos

FreeBSD/DragonFlyBSD VM bug