After many years of hearing about Vertex Buffer Objects (VBOs), I finally decided to experiment with them (my stuff isn't normally performance critical, obviously...)
I'll describe my experiment below, but to make a long story short, I'm seeing indistinguishable performance between "simple" direct mode (glBegin()/glEnd()), vertex array (CPU side) and VBO (GPU side) rendering modes. I'm trying to understand why this is, and under what conditions I can expect to see the VBOs significantly outshine their primitive (pun intended) ancestors.
Experiment Details
For the experiment, I generated a (static) 3D Gaussian cloud of a large number of points. Each point has vertex & color information associated with it. Then I rotated the camera around the cloud in successive frames in sort of an "orbiting" behavior. Again, the points are static, only the eye moves (via gluLookAt()). The data are generated once prior to any rendering & stored in two arrays for use in the rendering loop.
For direct rendering, the entire data set is rendered in a single glBegin()/glEnd() block with a loop containing a single call each to glColor3fv() and glVertex3fv().
For vertex array and VBO rendering, the entire data set is rendered with a single glDrawArrays() call.
Then, I simply run it for a minute or so in a tight loop and measure average FPS with the high performance timer.
Performance Results ##
As mentioned above, performance was indistinguishable on both my desktop machine (XP x64, 8GB RAM, 512 MB Quadro 1700), and my laptop (XP32, 4GB ram, 256 MB Quadro NVS 110). It did scale as expected with the number of points, however. Obviously, I also disabled vsync.
Specific results from laptop runs (rendering w/GL_POINTS):
glBegin()/glEnd():
- 1K pts --> 603 FPS
- 10K pts --> 401 FPS
- 100K pts --> 97 FPS
- 1M pts --> 14 FPS
Vertex Arrays (CPU side):
- 1K pts --> 603 FPS
- 10K pts --> 402 FPS
- 100K pts --> 97 FPS
- 1M pts --> 14 FPS
Vertex Buffer Objects (GPU side):
- 1K pts --> 604 FPS
- 10K pts --> 399 FPS
- 100K pts --> 95 FPS
- 1M pts --> 14 FPS
I rendered the same data with GL_TRIANGLE_STRIP and got similarly indistinguishable (though slower as expected due to extra rasterization). I can post those numbers too if anybody wants them.
.
Question(s)
- What gives?
- What do I have to do to realize the promised performance gain of VBOs?
- What am I missing?
See Question&Answers more detail:
os