The problem with graphics is simply that whoever designed early video boards decided to make them clock out the most significant bit first. Whether this was because the early graphics board designs were for big-endian computers, or whether it was because it made the graphics look like their conventionally-printed binary or hex representation I don't know. My guess would be that early character-based designs clocked out the MSB of the character ROM first for the above reasons, and because the bit order used there didn't have any effect whatsoever on the rest of the system. When people first designed bitmap displays, they could easily have made them clock out LSB first but at the time there was no reason to do so.
BTW, if performance rather than compatibility is the goal, designing a graphics system to clock out LSB first would be trivial. Alternatively, the address counter could be changed to count in reverse sequence. It might seem a little odd to have 0,0 be the lower-right corner of the screen, but if the code is written for that I see no reason to expect any performance disadvantages to such a system.
Graphics buffers are arranged to match the natural order of rasterization - left to right, top to bottom, and it is the most efficient way to address a bit in a field of arbitrary length.