I have concluded that the best, and a somewhat easiest way to achieve a good result to have the motion capture actor stand with their feet together. This way, when weighting the skin to the foot controls it is easier to maintain an equal distance from the wheelbase.
While running my tests I compared both methods, and although I do love the way Vardo jumps all about in the version where the entire mesh is parented to the rig, I much prefer the usability of having the wheels and base separate. Going forward, I will consider trialing a blendshape, or ncloth simulation.
Overall, I am happy that I made it work to a satisfactory level, but I would love to see this through to a more usable and versatile solution.
Below is a comparison of the two different models I trialed.
I have dozens of other test videos!