Sorry - I have to give you a two-part answer: first, the short answer on the loading of a player's fe - you're essentially correct. Each of the polygons still have to get drawn on the screen. You'd save on video memory with using the one or two texture files on more polys but the polys still need to be processed and drawn. The front end is where the video crunching is done, regardless of the origin of the instruction to make the change in object state. It makes no difference whether the origin is on another pc or on the particular user's pc, the change instruction will come from the user's processor, based on the latest game-state information on the user's system. That game state information is the current state of the game at the last frame updated and qualified by the last game-state info packet recieved from the server and (maybe, depending on how HTC have written that end of it) the game-state changes initiated on the user's pc since the last frame.
I can't speak to exactly what the AH server does and doesn't look after because I'm not privy to that information, but it would make sense to me that the AH server is more or less analogous to something like a fedex hub that collects changed-state messages it recieves from each of the pc's connected to it, logs them and collates them, and then transmits the collated changed-state (or game-state) info packages to all the pc's connected to it. Tanslation of those game-state messages happens on the system processor end of a users system so it shouldn't have any on affect on frame rate at all. The AH game server likely only tracks game-state information and only sends out changed-state information.
Anyway, I hope my explanation is both fairly accurate and makes some sense. I don't have access to HTC's methodology so I'm basing my answer on what I've read and found out on my own about building multiplayer twitch games.
Cheers,
asw