* command queue for atomic operations
* header file should be re-organized
* for bit control, usually, we need to read the IO port
  then apply bit operation on the value just read back, finally,
  write it out again.  To speed up the operation, we can use
  a "shadow register" which just the IO port's current value
  in memory.  In stead of read it back from port, just use the
  value in "shadow register" ... that will improve the performance a little
  bit.
