This is a great idea. A simple screen gesture in the companion app could launch the voice activation to listen for a command. For that matter, perhaps screen gestures could be used for all of the commands… Double tap for “Ride on”, Swipe left to turn left, swipe right to turn right, double swipe left for elbow flick, etc. Much easier to manage while riding. Plus a drop of sweat randomly landing on the stop button wouldn’t inadvertently end a workout.
I put together a simple app (Mac only for now) that does exactly that, you might want to check it,
search for zwift-voice on github (I cannot add links here).