delphij's Chaos

选择chaos这个词是因为~~实在很难找到一个更合适的词来形容这儿了……

29 Jan 2004

junsu's new PID allocation code

I have installed these code on my test box. Well… an automated patch mechanism would be better but the current approach is just ok.

The original code was ported from NetBSD, which was originally announced on March 11th, by David Laight. I will quote the original annoucement here:

“The main benefits are:

  • pid and pgrp lookup (by id) doesn’t require a search
  • no dependency on MAXUSERS
  • automatically scales well to large numbers of processes
  • small data footprint for small systems
  • ability to enumerate through all the processes without holding a lock
    for the entire duration, or having very messy locking rules.
    (the allproc list and p_list fields could be depracted later).
  • Largely MP clean

The basic idea is to ensure that you allocate a pid number which
references an empty slot in the lookup table. To do this a FIFO freelist
is linked through the free slots of the lookup table.
To avoid reusing pid numbers, the top bits of the pid are incremented
each time the slot is reused (the last value is kept in the table).
If the table is getting full (ie a pid would be reused in a relatively
small number of forks), then the table size is doubled.
Orphaned pgrps correctly stop the pid being reused, orphaned sessions
keep the pgrp allocated.

Below is the main part of the change, there are other bits lurking
in fork and exit. If people think this code is ok, I’ll sort out a
full diff against ‘current’ (and fix for sparc64).

(I’ve been running this code for months!)”

Traditionally, the BSD pid allocator uses a linear allocation policy, which does not scale well on MP case. If I was right, Linux 2.4.x (and older versions) and many other *nix systems are using the same approach, too. On Windows, the behavior is somewhat different - You can rarely see PIDs > 5000, and on *nix systems, it’s very common to see a PID at 90000 or even bigger, then it will bump back to a certain low number (e.g. 100).

The biggest benefit brought by the patch to FreeBSD, in my opinion, should be the scalablity.

An interesting behavior is that the new allocator will sometimes allocate PID < 100 to a newly forked process. Is this correct? I’m not sure… and I will post more results here.

Linux has already adopted a new PID allocator by sometime in 2.5.x (FIXME: 2.5.36? I may have to checkout from bk) days, as well as a new scheduler. William Lee Irwin III’s IRC talk about Generallized PID Hashing is here, it is an interesting discussion.

(to be continued…)