|
Vertex Buffers - Allegro Primitives mach_msg_trap |
rhuanjl
Member #16,740
September 2017
|
(sorry this will be slightly long...) Brief background In running profiling key slow downs were found in a function that creates a vertex buffer, it would do so by calling al_create_vertex_buffer (providing no initial data just specifying the number of vertices) then al_lock_vertex_buffer, then writing the vertex data to the buffer and then al_unlock_vertex_buffer. The call to al_unlock_vertex_buffer had a cost 8000 times that of anything else according to my profiler (Instruments on macOS); it was approximately 80% of our total execution time; unfolding the stack trace showed that the opengl calls underneath this always ended up at mach_msg_trap which was where all the time was being spent. Initial fix attempted This took away all the delay from al_create_vertex_buffer BUT added that same delay to the first time the buffer was used for a drawing operation (only the first time though, the speed was fine for subsequent draws) - this time the profiler showed that the first draw operation was ending up with mach_msg_trap and not the buffer creation. Next idea A bit more reading tells me that glEnableClientState is deprecated and is meant to have been replaced by glEnableVertexAttribArray which is called before the drawing operation by setup_state within prim_opengl.c. Current thoughts Questions (If relevant I'm doing this testing on a macbook pro with an Intel Iris Pro graphics card) Code 1bool
2vbo_upload(vbo_t* it)
3{
4 ALLEGRO_VERTEX_BUFFER* buffer;
5 ALLEGRO_VERTEX* entries;
6 vertex_t* vertex;
7
8 iter_t iter;
9
10 if (it->buffer != NULL) {
11 al_destroy_vertex_buffer(it->buffer);
12 it->buffer = NULL;
13 }
14
15 // create the vertex buffer object
16 if (!(buffer = al_create_vertex_buffer(NULL, NULL, vector_len(it->vertices), ALLEGRO_PRIM_BUFFER_STATIC)))
17 return false;
18
19 // upload indices to the GPU
20 if (!(entries = al_lock_vertex_buffer(buffer, 0, vector_len(it->vertices), ALLEGRO_LOCK_WRITEONLY))) {
21 al_destroy_vertex_buffer(buffer);
22 return false;
23 }
24 iter = vector_enum(it->vertices);
25 while (iter_next(&iter)) {
26 vertex = iter.ptr;
27 entries[iter.index].x = vertex->x;
28 entries[iter.index].y = vertex->y;
29 entries[iter.index].z = vertex->z;
30 entries[iter.index].u = vertex->u;
31 entries[iter.index].v = vertex->v;
32 entries[iter.index].color = nativecolor(vertex->color);
33 }
34 al_unlock_vertex_buffer(buffer); //<-all delay was here
35
36 it->buffer = buffer;
37 return true;
38}
Re-written vbo_upload (defers delay to first draw): 1bool
2vbo_upload(vbo_t* it)
3{
4 ALLEGRO_VERTEX_BUFFER* buffer;
5 vertex_t* vertex;
6
7 iter_t iter;
8
9 ALLEGRO_VERTEX vertices[vector_len(it->vertices)];
10
11 iter = vector_enum(it->vertices);
12 while (iter_next(&iter)) {
13 vertex = iter.ptr;
14 vertices[iter.index].x = vertex->x;
15 vertices[iter.index].y = vertex->y;
16 vertices[iter.index].z = vertex->z;
17 vertices[iter.index].u = vertex->u;
18 vertices[iter.index].v = vertex->v;
19 vertices[iter.index].color = nativecolor(vertex->color);
20 }
21
22 if (it->buffer != NULL) {
23 al_destroy_vertex_buffer(it->buffer);
24 it->buffer = NULL;
25 }
26
27 // create the vertex buffer object
28 if (!(buffer = al_create_vertex_buffer(NULL, vertices, vector_len(it->vertices), ALLEGRO_PRIM_BUFFER_STATIC)))
29 return false;
30
31 it->buffer = buffer;
32 return true;
33}
The draw is done using: num_vertices will be the number of vertices used when creating the buffer, bitmap will be a separately specified image to texture the shape with and vbo_buffer simply returns the relevant buffer. The buffer is not edited by anything else. |
beoran
Member #12,636
March 2011
|
ALLEGRO_PRIM_BUFFER_STATIC doesn't seem right to me, ALLEGRO_PRIM_BUFFER_STREAM or the other flags might work better. |
rhuanjl
Member #16,740
September 2017
|
beoran said: ALLEGRO_PRIM_BUFFER_STATIC doesn't seem right to me, ALLEGRO_PRIM_BUFFER_STREAM or the other flags might work better. Thanks for the suggestion I've just tried it unfortunately changing flags did not seem to produce any gain. I note that the intention is only to write to any given buffer once in the function vbo_upload shown below but then to draw it many times; hence the initial choice of ALLEGRO_PRIM_BUFFER_STATIC. |
beoran
Member #12,636
March 2011
|
Hmmm, it seems like this could be an Allegro performance bug on osx. Perhaps the opengl solution you worked out could be helpful in fixing this. However, the mach_msg_trap seems to be a bit of a red herring: |
rhuanjl
Member #16,740
September 2017
|
beoran: thanks for looking at this for me. Some further googling and reading around suggests that I'm not the only person who's had problems with openGL code that uses glEnableVertexAttribArray when used on macOS - and it does sound like it's probably a macOS specific issue. But what I can't see anywhere is a solution. Thankfully the code runs and as it's only one delay per VBO it's not disastrous - can still create a 4 vertex VBO in 0.15 milliseconds, I just think that it should take more like 0.04 or so. I should probably test higher vertex count cases and see if it becomes a more significant issue. And I suppose if I want a fix I need to read some openGL 3/4 macOS specific guides. |
SiegeLord
Member #7,827
October 2006
|
I am confused by your setup a bit. Are you continually creating a vertex buffers? They are meant to be created once and reused multiple times. "For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18 |
rhuanjl
Member #16,740
September 2017
|
@SiegeLord: I wouldn't normally be continually creating VBOs I'm well aware that they're designed to be made once and used many times. For testing performance I had the code create 20,000 VBOs then draw each of them 10 times which is where the time measurements I've mentioned come from. On the macbook pro I'm using creating the 20,000 VBOs takes 3-3.5 seconds (with the original version of the function) around 0.5 seconds with the edited version. With the original version the draw operations take about 1 second (all 200,000 draws), with the edited version the first draw operation for each VBO (i.e. the first 20,000 operations) collectively take 3-4 seconds, with the remaining 180,000 taking < 1 second. Conversely if I added in glEnableClientState(GL_VERTEX_ARRAY) to the relevant line within al_create_vertex_buffer as well as swapping to the alternate loading function the creation process dropped to 0.5 seconds and all the draws together took around 1.1 seconds. |
|