Start this. If we really want to support pre-ARMv6, will need to use different (and less reliable) instructions to implement atomic operations (here and in write-barrier subprims.)