Avoiding critical sections or shared data access whenever possible is key to successful multithreaded programming. After a recent discussion in another thread here I decided to drop multithreading in my game, but now I have enough good reasons to switch back to it. Anyways, I had the following idea to handle game modes/modules in my game:
This is a very abstract example, but the idea here is to avoid the necessity of mutexes by making it impossible for the rendering thread to even access any graphics before they're reading. I don't really trust Allegro mutexes, the reason can be seen in this old thread. I don't know if I just used them the wrong way or if they weren't fully functional (and I would love if someone with more experience can clarify), but anyways my question is the following: Does the above idea work? Or in other words: When does the new operator actually return an address? Is it like a function call, which only returns a value once the function itself returns (so in this case after the constructor is done), or does it return an address the moment memory is allocated? In the first case my solution would work because gameModeObject could only become != NULL after all resources are loaded. In the second case Display() could be called without all resources being loaded and therefore cause errors. In this case I would need mutexes.
Now for a second question. I might load my resources in the logic thread (for better resource management and stuff). This, of course, will load them as memory bitmaps, since the display is created by the rendering thread. My question concerns al_clone_bitmap(). I read you can use it to easily convert memory bitmaps into video bitmaps. But how fast is it? We all know that loading graphics from disk is a rather slow process, but al_clone_bitmap() as far as I understand works with memory only, so it should be a rather fast process, am I right? If it's actually pretty slow then I might have to load my resources on the display thread, after all, which would kill one of the purposes I want to switch back to multithreading in the first place. Basically I want to be able to show an animated loading screen witout extreme programming effort. This way I could make the logic thread load resources (which takes quite a while) while the display thread could animate and display a logo or something.
A third thread concerns multithreading on systems with one processor core only. Does anyone know how multiple threads behave on such a system? From what I know each platform has a scheduler which tries to give each thread about the same processor time. This behaviour is mandatory for having my game work the way I want it to (and for benefitting from one of the other reasons I'm switching back to multithreading). If you have read one of my recent threads you probably know that I'm aiming for the following game structure:
Logic thread: Constant FPS (25)
Rendering thread: Variable FPS with interpolation (letting it run through as often as possible or letting VSync dictate the speed)
What I want is that the logic thread, since it has only 25 FPS, can always process all of its frames even on slow hardware, whereas the rendering thread's FPS gets reduced on slow hardware to allow the logic thread to always get the needed CPU power (while on fast hardware the interpolation makes a smooth image possible). However, since I don't really know how to program this kind of speed control optimally I thought I could just let the OS handle everything and for this I need to know how singlecore systems behave. I know that my approach should work on systems with at least two cores, since both threads would always run on different cores. But can I assure that my logic thread always gets the time it needs even on singlecore systems (assuming the OS isn't slowed down by other processes)?
Whew. These questions were definitely hard to word. I hope you always get my point.
Have you thought about the possibility of having the mutex'es set a flag instead of trying to control big blocks? Kind of like the "occupied" flag in a public toilet.
You might also google for "processor affinity" to see how you can restrict code to specific cores.
Have you thought about the possibility of having the mutex'es set a flag instead of trying to control big blocks? Kind of like the "occupied" flag in a public toilet.
Hmm... can you give a simple example for this? I'm not sure if get the point.
You might also google for "processor affinity" to see how you can restrict code to specific cores.
Alright. Anways, will this work with Allegro's threads?
can you give a simple example for this?
I did have something like this for a mandelbrot set and a cpu limiter but I can't find them on this terabyte disk ATM.
Anyway, you'd just lock a flag instead of some big array of data and the other thread would check this flag. This wouldn't work
if the other thread was blocked just trying to read it.
Alright. Anways, will this work with Allegro's threads?
I have no idea. I read about it a few years ago and mentioned it here (?) thinking seperate threads would benefit being on different cores
to avoid reloading the L1 cache so much, but it was recommended that the OS can do a better job anyway.
Anyway, you'd just lock a flag instead of some big array of data and the other thread would check this flag. This wouldn't work if the other thread was blocked just trying to read it.
OK, I think I get what you mean. But in my case I actually want a thread to wait when another is accessing shared data (or at least I don't mind it).
Basically my main goal is completely safe access of shared data. I always thought that you do it by having the following code in both threads:
From what I read that's also how it's supposed to work, but as you can see in that old thread I linked in the introduction post that doesn't or didn't always work for me. The error I got back in that old program I can only explain to myself being caused by two threads accessing the graphics at the same time.
In the RTS game I am developing each AI player has its own thread to "think". Each "thinking" thread gets a pointer to the object representing the AI player in the world, in order to build any structures, move its units etc etc, it tells the world thread to do it for it so that there is no runtime error. Each AI thread sets some values in a struct and then flips a bool. The world thread checks the flag first to determine if it is safe to access the struct, if it is the world thread will read the values, execute the required behavior then reset all the values and flip the bool back to tell whatever AI player flipped its access bool that it can make another request
For example:
struct message { bool sending; //other fields // // };
//world thread if(this->message.sending)//the AI thread is trying to tell us something!!! { //process struct //reset struct this->message.sending = false; }
//AI thread //com is an abbreviation for commander and is the pointer to the commander //object in the world if(!com->message.sending)//no message is pending { //its safe to start filling in fields to make the request //flip the flag com->message.sending = true; }
Hope this example helps
Yeah, I think I get what you mean, but I also think that this isn't really helpful for my particular situation. Remember that I'm using interpolation and that therefore my rendering thread gets called a lot more often than the logic thread. If I got it right then in your example each thread accesses the data only once, then sends a message and waits for the other thread to send a message. However, in my example the rendering thread may actually be called many times between each logic frame and of course I want a smooth output and therefore a new image each frame, so while this method in general seems to be useful it won't help me for my particular situation.
Basically my main concern is this: How do you use Allegro mutexes the right way to by 100% safety prevent two threads from accessing the same data? As I said the example I gave above doesn't really seem to work for me. :/
One remark: 25 logical updates per second seems not enough.
If you use a 'state' for keyboard/joystick input ("is 'up' being pressed" etc.) it may be possible to press and release a button very quick between two logic updates, so it would completely miss the press.
In practice, I've never been physically able to produce such problem when testing with 50 logical updates per second, but maybe at 25...
Isn't that why A5 moved to an event system?
Even if the library gives event-based API, it's not a bad idea to handle your own non-volatile "state". It can avoid problems like:
Such code makes the assumption that the key[] array stays unchanged during execution. But if you're extremely unlucky, you can get a jump-left by pressing up-right and releasing the right key at the wrong moment.
If you use a 'state' for keyboard/joystick input ("is 'up' being pressed" etc.) it may be possible to press and release a button very quick between two logic updates, so it would completely miss the press.
I have already thought about this, but it seems to work quite well so far and in the worst case all I have to do is editing a constant in my code to get higher FPS. Still 25 seems to be more than enough. Consider this: If a player could press and release a button between two frames than that would mean that in my case a player could also press a button at least 25 times a second. However, even if you're moving your finger really fast you'll find that you won't be able to press the same button about ten times a second. Therefore - during normal play - I doubt that this will ever happen. The player would have to press the button for less than 0.04 seconds, but I think a default button press takes about 0.10 to 0.15 seconds.
Well, whatever. Any advice on the actual problem with the mutexes?
I don't see the point of the boolean gameModeInitialized, since only one thread performs LogicThread() you don't risk concurrent initialization.
I can confirm that in a line like gameModeObject = new CGameModeDerivedClass(),
the variable gameModeObject will be set only after the constructor has finished, so the check !=NULL in DisplayThread() will ensure initialization is complete.
But you still risk other concurrency problems: DisplayThread() will have to work even when the game state gets modified at the worst possible moment.
I don't see the point of the boolean gameModeInitialized, since only one thread performs LogicThread() you don't risk concurrent initialization.
Well, this is only a very abstract example. Imagine that LogicThread() runs in a loop.
I can confirm that in a line like gameModeObject = new CGameModeDerivedClass(),
the variable gameModeObject will be set only after the constructor has finished, so the check !=NULL in DisplayThread() will ensure initialization is complete.
Alright, thanks. Good to know!
But you still risk other concurrency problems: DisplayThread() will have to work even when the game state gets modified at the worst possible moment.
I'm not sure if we mean the same thing here, but I may just backup important variables (layer positions, sprite positions) at the beginning of rendering frame to prevent the data from changing mid-frame. Not all is set in stone, though, and I don't think it'll ever lead to serious problems (only minor sprite missplacements). Unless you mean something else than me.
backup important variables (layer positions, sprite positions) at the beginning of rendering frame
Yep. And even this backup should be mutexed, to avoid the risk that Display() starts "consuming" it before Update() has finished "producing" it.
If you find it very constraining, indeed it is. Personally I find it easier (less code) and safer (less risk) to run things in sequence. You could even find it faster, since your program handles fewer different things at the same time, causing fewer cache misses.
I don't think it'll ever lead to serious problem
Concurrency is all about worst-case scenarios, no matter how inlikely :-) If there is 0.01% that something bad happens when Logic() runs, statistically it will happen once every game hour.
Dangerous cases are for example when you have a dynamic list of game elements to draw, and the logic modifies the list (insert new item, remove destroyed item) while the display is iterating on it.
Yeah, I know what you mean. It's a well known issue. What I meant was just that a sprite missdisplayed for a single frame isn't too serious in my opinion. But yeah, you're right. Since a linked list is a pretty critical section I might as well just protect them all by a single mutex.
Well whatever, I have another question now. It's related to timers in multithreaded applications. Just to try it out I rewrote my application for multithreading. Rendering runs on the main thread, logic on a seperate thread. Now here is the problem: I turned off VSync (which means ~3000 rendering frames a second) to see how my application behaves in this case and weirdly - although logic runs on a seperate thread - this causes an extreme input lag and also leads to weird audio behaviour. My guess is that all Allegro functions (timers, audio, input etc.) are still using the main thread, which is now extremely busy. Am I right about this? How can I fix this? Maybe by one of this methods?
-Switch logic to main thread and rendering to the other thread?
-Create timer on logic thread instead of main thread?
-Initialize plug-ins on the logic thread?
My guess is that the first could work, but probably someone of you knows for sure?
EDIT:
Of course I could just install a timer for the display thread (for all graphics cards that don't support VSync), but that's what I want to avoid so that I can always just get the maximum rendering rate.
The fourth option is for your display thread to rest() a minimum amount of time after each drawn image: If you've just sent an image to the screen, it seems useless to modify it less than, for example 10ms later.
The fourth option is for your display thread to rest() a minimum amount of time after each drawn image: If you've just sent an image to the screen, it seems useless to modify it less than, for example 10ms later.
Hey, you have a point there. Just putting an al_rest(0) or al_rest(0.01) should fix this while not causing too much trouble. Will try this immediately.
EDIT:
I used al_rest(0.004) and it seems to be pretty responsive now. Another nice side effect: This automatically limits the FPS to 250 (useful for those people without VSync). Didn't figure that setting up an FPS limit is this easy.
Now I just hope that this doesn't have negative side effects on graphics cards WITH VSync and a monitor with high frequency. But oh well. I doubt that people will ever have more than 100 Hz, anyways.
EDIT:
And now I switch to using a 250 FPS display timer, anyways. That's just more convenient and doesn't suck up all the processor time. Wonder why I didn't do this in the first place. Guess brain fart or something. I just didn't think about using the timer only to set a maxmium frame rate.
Now the only thing that still remains is the mutex problematic. Isn't here anyone who knows more about this? Have I just been using them the wrong way all this time? And if so: How do I properly use them?
When does the new operator actually return an address?
The variable gameModeObject will be set to the address of the allocated object right after the constructor finishes executing.
However, changes may not be seen immediately by the two threads. One thread might see 'null' while the other might see the actual variable. That will happen only once though.
But can I assure that my logic thread always gets the time it needs even on singlecore systems
Nope. If the two threads have the same priority, they will get an equal time slice out of the CPU.
Your approach to multithreading can be improved by using an event system to notify each thread about changes in the other thread. By using an event system, you don't need mutexes, and if you don't share any data between the threads, you'll have no synchronization problems.
Nope. If the two threads have the same priority, they will get an equal time slice out of the CPU.
But doesn't hat suffice? If both threads get exactly x milliseconds each call, but the logic thread needs a lot less performance than the rendering thread (due to less FPS), isn't it likely that the logic thread will be able to work properly even when the display thread is slowed down? If that's not the case: Is there any way to handle priority with Allegro threads?
Your approach to multithreading can be improved by using an event system to notify each thread about changes in the other thread.
To me this actually seems to be the more complex way. I'd basically have to consider each sprite, each layer, the camera position and some other values and add them to the event system. It seems easier and more effiecient to me to just use mutexes.
and if you don't share any data between the threads, you'll have no synchronization problems.
Of course, but that's out of question in my situation. I have to process levels in the logic thread and then render them in the rendering thread, so there is no way around sharing some data.
When you pass a message, you're supposed to pass enough data so the recipient can do its job. The passed data is not actually "shared".
IMO it's not a question of thread priority: a system unable to display 250 times per second will be overloaded if you insist on asking it. It's your job to skip frames in order to adapt to the user's machine. The vsync helped you because it automatically stopped your thread in a non-busy way, giving back the free time to all other system tasks (including your logic).
When you pass a message, you're supposed to pass enough data so the recipient can do its job. The passed data is not actually "shared".
OK, I see what you mean now. Actually yeah, I guess I could do something like that. Yeah, that's a pretty neat idea now that I think about it. I just have to handle it right.
IMO it's not a question of thread priority: a system unable to display 250 times per second will be overloaded if you insist on asking it. It's your job to skip frames in order to adapt to the user's machine. The vsync helped you because it automatically stopped your thread in a non-busy way, giving back the free time to all other system tasks (including your logic).
Yeah. I would love to just use VSync on all systems (since it's pointless to display more frames than the monitor can display, wanys), but that's unfortunately some graphics cards just don't support it. The 250 FPS is only an emergency solution for those systems. It is a lot, yeah, but I just don't know how high modern monitors go. However, recently on TV I saw a commercial of a 200 Hz TV and I want to make sure that all monitors will benefit from my game structure (since I'm putting so much effort into it to begin with).
EDIT:
Well... I just found out there are TVs that actually support up to 1600 Hz. So yeah, **** that! I'll just stick to the 250 FPS for now, although even 100 are probably enough. To the player 200 FPS vs 100 FPS won't make much difference, anyways. Even 100 FPS vs 60 FPS will already barely be noticable.
EDIT:
A last question before this thread disappears into the void: I read somewhere that Allegro's events (or all functions for that matter) are completely thread-safe. Is this true? So can I - for example - use al_peak_event() on the one thread and al_flush_event_queue() on the other thread without using mutexes and without fearing to break my program? This would indeed make everything a lot easier.
I don't think you should tryto do something like that. Each thread should best get it's own event queue and listen to the events it needs.
I don't think you should tryto do something like that. Each thread should best get it's own event queue and listen to the events it needs.
But that kills the purpose of it. I want to use the event system to communicate between the two threads. Basically I want the logic thread to copy all the data the rendering thread needs, then store it in the event queue and remove the previous event from the queue. The rendering thread then always peaks the last event and uses it to base the current frame on.
If the render thread has an event queue that listens to a user event source from the logic thread, you can send events from the logic thread to the render thread trough al_emit_user_event. The queue in the render thread will automatically get the event.
Those TVs with such high framerates are meant to be used for stereoscopic 3d anyway.
I don't think it really has any point to go far beyond 120hz for 2d.
From what I've read the point of 100Hz or 120Hz monitors is different... a large amount of movies uses 24 FPS. On a 60Hz monitor that can't be smoothly displayed without stutter since not each frame can be displayed for the same time (unless you do some kind of interpolation - which leads to different artifacts). On 100 Hz or 120 Hz it looks completely smooth however (each single source frame is displayed for 4 or 5 frames respectively).
Yeah, I reduced the timer to 100 FPS now. Seems to be the best solution. Anything above is overkill, really.
If the render thread has an event queue that listens to a user event source from the logic thread, you can send events from the logic thread to the render thread trough al_emit_user_event. The queue in the render thread will automatically get the event.
That is exactly what I intend to do. Don't know if I made another impression. All I need to know is if event queues are thread-safe (in other words: if one thread can edit/fill an event queue while another is reading it) or if I need to protect them from simultaneous access. Of course I could read AND clear the event queue in the rendering thread, but in my opinion that's inconvenient, so as long as it's not unsafe I prefer to clear the event queue in the logic thread. Or in other words: Whenver the logic threads emits a new user event, I want it to clear the last one afterwards (probably with al_get_next_event(), whereas I use al_peek_next_event() in the rendering thread).
Event queues are thread safe.
Alright, thanks. That's all for now.
Even if the allegro functions are thread safe, the way you describe isn't. You can't have two threads read the same queue, each at their own speed.
I'll always make sure that the rendering thread has at least one valid event to work with (since I'll use al_peek_next_event()), so as long as simultaneously reading/editing the queue itself doesn't cause errors I'll be fine.