delphij's Chaos


05 Jan 2007

ULE 2.0 hits -HEAD

Today, Jeff Roberson has committed his version 2.0 ULE scheduler. This new version has addressed several design issues as well as several bugs.

The new scheduler has adopted a circular queue, instead of the double-queue structure which is also found in the Linux O(1) scheduler. The latter has lead to difficulty implementing nice correctly.

For uniprocessor case, ULE is now faster.

MP algorithm has been simplified a bit.

A lot of bugfixes, etc.

To quote the original commit message:

ULE 2.0:

  • Remove the double queue mechanism for timeshare threads. It was slow due to excess cache lines in play, caused suboptimal scheduling behavior with niced and other non-interactive processes, complicated priority lending, etc.
  • Use a circular queue with a floating starting index for timeshare threads. Enforces fairness by moving the insertion point closer to threads with worse priorities over time.
  • Give interactive timeshare threads real-time user-space priorities and place them on the realtime/ithd queue.
  • Select non-interactive timeshare thread priorities based on their cpu utilization over the last 10 seconds combined with the nice value. This gives us more sane priorities and behavior in a loaded system as compared to the old method of using the interactivity score. The interactive score quickly hit a ceiling if threads were non-interactive and penalized new hog threads.
  • Use one slice size for all threads. The slice is not currently dynamically set to adjust scheduling behavior of different threads.
  • Add some new sysctls for scheduling parameters.

Bug fixes/Clean up:

  • Fix zeroing of td_sched after initialization in sched_fork_thread() caused by recent ksegrp removal.
  • Fix KSE interactivity issues related to frequent forking and exiting of kse threads. We simply disable the penalty for thread creation and exit for kse threads.
  • Cleanup the cpu estimator by using tickincr here as well. Keep ticks and ltick/ftick in the same frequency. Previously ticks were stathz and others were hz.
  • Lots of new and updated comments.
  • Many many others.

Tested on: up x86/amd64, 8way amd64.

David Xu has pointed out that, in reply to Jeff’s announcement:

“I think it might be not a right way to work on FreeBSD thread scheduler, it is more important to work out a cpu dispatcher rather than inventing a dynamic priority algorithm to replace 4BSD’s algorithm, the 4BSD dynamic priority algorithm is still the best one I can find, it provides very good fairness. the most important thing is there should be a cpu dispatcher which knows how to place a thread on a cpu with cpu affinity-aware, maybe multiple runqueues, it knows cpu topology, and may be NUMA awareness, maybe provide cpu partitions, root can create and destroy a partition, root can add cpu to the partition or remove a cpu from the parition or move a cpu from partition a to partition b, bind applications to a partition etcs. On the top of cpu-dispatcher, there could be 4BSD or other dynamic priority alogrithm, but that’s less important than this one. with this thought, I am going to remove sched_core as I found the cpu dispatcher is the key thing.”