It's a fairly simple thing, and seems to be at least reasonably accurate in practice, that EJ types, being the dynamic, rational, linear-energetic temperament, would tend to have sharper movements and manners of expressing themselves, and be relatively energetic, whereas IP types, being dynamic, irrational, "receptive-adaptive" temperament, would generally have slower, smoother movements and expressions. It's not really a leap of faith; it seems rather obvious, given that socionics describes how our minds focus, and considering that our minds control our bodies. While it's not a PERFECT correlation in practice, generally it's not a horrible assumption, and it can manifest VERY obviously in some people.
It basically comes down to making one very basic theoretical assumption (not a theoretical framework, but a functional assumption based on incidences of correlation) and observing it in practice. And it happens to be pretty clearly attributable to an "abstraction" of "external dynamics"
I can see your rational, but again, when you break what Ashton does down according to IM, it's really external dynamics more than anything.