Hey guys,
I recently compiled A5.2 from GIT and when I run ex_threads.exe it crashes. Here's the log and backtrace :
I'm investigating this but I haven't found the cause yet. I looked at _al_d3d_generate_display_format_list and _al_vector_alloc_back but so far I haven't really found anything obvious.
From the addresses and the warnings, it looks like it is trying to write past the allocated memory for eds_list->_items in frame #10.
What's weird is that it is crashing in realloc. Can't explain that one.
EDIT
Okay, so I've been doing some debugging, and I've been looking at _al_d3d_generate_display_format_list and I'm getting some odd values that I can't explain. On line 90 of d3d_display_formats.cpp, a triply nested loop appears. When it crashes, i, j, and k are all zero, which should indicate it is the first time through each loop. The weird thing is though, that eds_list._size is already 3 and eds_list._unused is already 1, which would indicate al_vector_alloc_back has been called three times, as well as realloc. How can a single line of code in the third level of the loop have been called 3 times when each loop index is zero? It makes no sense. That's as far as I've gotten debugging this.
ex_threads creates several windows, so that loop is executed several times in parallel - but I didn't actually look at the code.
I think then that the problem is this variable :
Since eds_list is not part of TLS it gets shared among the threads in ex_threads.cpp, which are creating displays in parallel. So eds_list's contents get reallocated in one thread while another thread has an old memory address for the vector's contents and then it accesses memory that is no longer valid.
In addition, there are two other static variables in the _al_d3d_generate_display_format_list function. I'm not sure what the comment "stop [the] warning" means or what it is for. It looks like they are there to prevent the function from re-generating the display format list if it has already been created and the adapter and fullscreen variables match the new display options.
So I think that if eds_list, fullscreen, and adapter are moved into TLS then ex_threads will be able to run safely, since then each thread will have its own display settings list.
Or simply not make them global, thread local or not?
I haven't looked at the code though so maybe there is a reason those are global - but my first try at fixing it would be to simply not make those variables global.
fullscreen and adapter need to persist between function calls, as well as the eds_list. There should be one instance per thread, because each time you create a display they change, and a single thread can have multiple displays.
Edit
Well I guess technically that's not true. You could recreate the list each time. However, I think the point was to re-use it.
Edit2
Actually, I don't know. Multiple functions work on the same data, but that could be solved by passing a reference. It would force you to fully re-create the display list each time you create a display though. I don't know how expensive that is. The list only needs to be re-created when fullscreen or adapter change.
Edit3 04/28/2016 1:00AM
Do you guys want me to work on a patch for this? Right now I'm leaning towards moving eds_list, fullscreen, and adapter into TLS. Or else we could just recreate the list every time _al_d3d_generate_display_format_list is called. Thoughts? I'm willing to work on this if you want. Otherwise, ex_threads is broken on Direct3D.
I would say re-create the list every time - but I haven't looked at the code. My gut feeling just is that it's always bad to keep stuff around in some global variable - and this is not something that should take a long time nor something that will happen very often, so I'd say re-create it when needed.
If you make the patch to just move it to TLS that probably also is acceptable though
I think I will add it to TLS. I'll do it here in the next few days or so.
Edit
Okay, I'm in the middle of adding it to TLS. How should I interface with the thread local storage though? tls_get is not accessible outside of tls.c. Should I make accessor functions? Or can I export tls_get somehow? It seems silly to have three accessor functions just to access three variables in tls.