In my opinion, the worst disadvantage of little-endian format is handling bitmapped data (e.g. graphics buffers and masks) - the word orders must be rearranged with each access to get spatially coherent data. This is a performance issue, and one of the main reasons x86 processors are not preferred for graphically-intensive applications.
If I recall correctly, big-endian order can also perform better on longword shifting, symbol hashing, lookup tables, etc.
The problem with graphics is simply that whoever designed early video boards decided to make them clock out the most significant bit first. Whether this was because the early graphics board designs were for big-endian computers, or whether it was because it made the graphics look like their conventionally-printed binary or hex representation I don't know. My guess would be that early character-based designs clocked out the MSB of the character ROM first for the above reasons, and because the bit order used there didn't have any effect whatsoever on the rest of the system. When people first designed bitmap displays, they could easily have made them clock out LSB first but at the time there was no reason to do so.
BTW, if performance rather than compatibility is the goal, designing a graphics system to clock out LSB first would be trivial. Alternatively, the address counter could be changed to count in reverse sequence. It might seem a little odd to have 0,0 be the lower-right corner of the screen, but if the code is written for that I see no reason to expect any performance disadvantages to such a system.