|
This thread is locked; no one can reply to it. |
1
2
|
Allegro 5 performance |
Andrew Gillett
Member #15,868
January 2015
|
I've been developing a game using Allegro 4, recently I tried switched to Allegro 5 due to some issues which I'll write about in a separate post. However, the performance is much, much worse. In Allegro 4 I was writing all graphics to a memory bitmap, then using AllegroGL to scale the backbuffer to the screen. In Allegro 5 I'm using al_draw_scaled_bitmap to display the backbuffer. With memory bitmaps it's incredibly slow, but even with video bitmaps it's far too slow. Comparison of profiling results (render1 = write sprites to backbuffer, render2 = scale backbuffer to screen) Allegro 4: Allegro 5, new bitmap flags = ALLEGRO_ALPHA_TEST As above but also with ALLEGRO_NO_PRESERVE_TEXTURE With ALLEGRO_OPENGL, but without ALLEGRO_NO_PRESERVE_TEXTURE Turned off alpha channel (will be ok for most sprites): Attached is a screenshot of the level. This level is much bigger than the standard one, but even on smaller levels the performance is terrible. The tile sprites, which make up the majority of the sprites in this image, are 32x32. |
dthompson
Member #5,749
April 2005
|
This sounds symptomatic of memory bitmaps being used elsewhere (ie. as part of your pipeline - not with texture preservation). Allegro 5 isn't good at memory bitmap stuff So - are you using memory bitmaps anywhere in your Allegro 5 code when you say "even with video bitmaps it's far too slow"? ______________________________________________________ |
Andrew Gillett
Member #15,868
January 2015
|
There are only 2 calls to al_create_bitmap and 1 call to al_load_bitmap in the entire codebase, and al_set_new_bitmap_flags is called before each of them. If I call al_get_bitmap_flags before each draw, I get 0x410 (ALLEGRO_VIDEO_BITMAP | ALLEGRO_ALPHA_TEST) for most of them, but the display gets 0x400 (ALLEGRO_VIDEO_BITMAP). |
Chris Katko
Member #1,881
January 2002
|
It's likely you're doing something wrong. Weird drawing code. Locking bitmaps (which forces copying to memory). Etc. I get easily 100x performance with Allegro 5 with a similar tiled bitmap game on a tiny netbook. Is this Linux, Windows, or Mac OS X? -----sig: |
MikiZX
Member #17,092
June 2019
|
I believe you have likely done some testing but just in case this can help - if you are working with tilemaps and Allegro5 then a good solution to speed things up (considerably) would be to modify your tilemap drawing code to use vertex buffer objects (you can check a non-optimized version of this here https://github.com/mikiZX/Allegro5-2d-Platformer-Demo-using-VBOs-and-Tiled-tilemaps ). |
Andrew Gillett
Member #15,868
January 2015
|
I am on Windows. I am not calling al_lock_bitmap, and all put/get_pixel calls are commented out. I have done some more tests, and found that even when nothing was being drawn, a single call to OutputDebugString (Microsoft's function for writing to Visual Studio's debug output) caused the enclosing profiling region to randomly vary between low numbers (eg 0.5ms, still not great for a single debug print but not out of the ordinary for that function) and very high numbers, sometimes over 10ms - see attached image. Problem solved? I re-enabled the drawing code and commented out OutputDebugString, but performance was no better - still in the region of 40-80ms per frame. Both drawing the sprites to the backbuffer, and drawing the backbuffer to the screen, vary by up to a factor of 2, although the latter is more stable. Does it matter that the backbuffer does not have power of 2 width/height? |
MikiZX
Member #17,092
June 2019
|
For sure the Allegro wizards that follow this forum will be able to provide a definite answer but I've searched Allegro's GitHub repository for 'power-of-two' and 'pot' and it says this (seems to be the case for both OpenGL and DirectX): |
Edgar Reynaldo
Major Reynaldo
May 2007
|
In Allegro 5 you just don't use memory bitmaps, hardly ever. The A5 way is drawing from a tile atlas, you have like 22 unique tiles on that screen at 32x32, that's 2x11=64x352 which would easily fit in an atlas that was 512x512 or better. And of course OutputDebugString will slow your program down writing to the console like that. It's much faster to write to a file. EDIT My Website! | EAGLE GUI Library Demos | My Deviant Art Gallery | Spiraloid Preview | A4 FontMaker | Skyline! (Missile Defense) Eagle and Allegro 5 binaries | Older Allegro 4 and 5 binaries | Allegro 5 compile guide |
kenmasters1976
Member #8,794
July 2007
|
Edgar Reynaldo said: The A5 way is drawing from a tile atlas What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?. Quote: Show game loop and rendering code I second this, could be useful for doing a benchmark on different systems. I've noticed that Allegro 5 performance drops when drawing lots of bitmaps but I always blamed it on having an old graphics card and using open source drivers. I've never done any serious benchmark but I've noticed CPU usage increasing considerably when drawing lots of bitmaps, which always seemed a bit off to me considering that Allegro 5 is supposed to do its drawing on the GPU but, as I said, I've never done any serious testing and just considered using the low level primitives in cases when you need to draw lots of bitmaps.
|
dthompson
Member #5,749
April 2005
|
kenmasters1976 said: What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance? Using 22 separate bitmaps probably won't be an issue, but once you start getting into the hundreds, I'd imagine you'd start seeing a serious dip in performance (or increased memory usage). I'm being quite unscientific here though; I'm just aware that it's better to have fewer discrete textures. Quote: I've never done any serious testing and just considered using the low level primitives in cases when you need to draw lots of bitmaps. Yes indeed - it'll be even faster if you're using vertex buffers. ______________________________________________________ |
Polybios
Member #12,293
October 2010
|
kenmasters1976 said: What's the difference between using a 512x512 atlas (together with sub-bitmaps, I assume) and using 22 different source bitmaps? Does it affect performance?. Yes! IIRC, drawing from different source bitmaps is the equivalent of several OpenGL texture changes. This will slow down drawing and should be avoided if possible (e.g. use an atlas with sub bitmaps, or even sort by source bitmap if possible). So it's not about the number of source bitmaps but the number of switches between them. IIRC, this should be adressed before attempting to use vertex buffers. The GPU is happiest if you setup "state" once and then only send geometry or alter matrices. |
MikiZX
Member #17,092
June 2019
|
kenmasters1976 said: I've never done any serious benchmark but I've noticed CPU usage increasing considerably when drawing lots of bitmaps, which always seemed a bit off to me considering that Allegro 5 is supposed to do its drawing on the GPU
I doubt this would be due to your graphics card (unless it is over maybe 15 years old). The likely reason the CPU usage would increase is due to 'feeding' the GPU with the data it needs to draw - sort of 'bottlenecking' the GPU-CPU data transfer on the CPU side as CPU will not be able to feed the data to the GPU at the rate at which the GPU can draw. This is I believe the main reason one would like to have the CPU send data only once to the GPU (using VBO) as opposed to sending the data many times (once per bitmap you wish to draw). From my experience (not very extensive, mind you) it is faster to give the CPU task of re-creating the VBO each frame and then drawing that than actually sending each bitmap individually. The atlas idea is used here as one VBO draw call will be bound to only one bitmap(or a single set of bitmaps if a special shader is used) - if you want to draw different bitmaps with VBO you will need to create one VBO per bitmap you will use (and this will again increase the amount of cpu-gpu talk). Thus packing all your bitmaps into a single one and having only one VBO draw call will/should be the fastest way. |
Edgar Reynaldo
Major Reynaldo
May 2007
|
If you don't need tiling, atlas'es and sub bitmaps are the way to go. If you need tiling, you need either a My Website! | EAGLE GUI Library Demos | My Deviant Art Gallery | Spiraloid Preview | A4 FontMaker | Skyline! (Missile Defense) Eagle and Allegro 5 binaries | Older Allegro 4 and 5 binaries | Allegro 5 compile guide |
kenmasters1976
Member #8,794
July 2007
|
Well, that's some interesting info on VBOs which is not immediately obvious when using Allegro 5, in particular if you come from using Allegro 4 in the past. I always thought 'hardware accelerated' meant the graphics card would take care of all the drawing with little to no load on the CPU; I also assumed that loading a bitmap as video bitmap meant it was loaded as a texture on video memory, from where the graphics card could access it and draw it with no load on the CPU either, even when using the traditional Allegro 4 way of drawing things, but apparently it is passing the coordinates/geometry to the GPU what can slow down the process? This requires a whole new way of thinking about the drawing process in Allegro 5.
|
Chris Katko
Member #1,881
January 2002
|
1. Can you just post the whole project so we can profile it? I have no problems holding 60 FPS... on a netbook... with an i3 celeron and intel HD graphics while drawing multiple tile layers (floors, walls, "decals", "dirt", and lighting layers) in 1366x768. I don't use VBOs or display lists or anything "fancy". It's even possible your measuring wrong. If you're running Windows, use one of those nVidia (or Windows [Windows Key + G] or whatever) FPS meters. Linux might have an equivalent. Also, what's your video mode? You're not in some kind of creepy 24-bit mode? (somehow different than your images in memory, forcing a color conversion every frame.) Letting other people compile and profile your project will eliminate many of these variables. -----sig: |
Andrew Gillett
Member #15,868
January 2015
|
I've done some more profiling and I'm getting better results than I had been getting before – so it's possible there was something else going on like a rogue logging call somewhere. Here are some profiling results from a simplified test where it draws around 2000 32x32 sprites per frame. Render1 is drawing the sprites and Render2 is scaling the backbuffer to the screen. In this test, the screen resolution is 1680x1050 and the back buffer is 1888x1120 (although most levels will be considerably smaller than this). Allegro 5 Allegro 4 (Render2 uses allegro_gl_make_texture_ex and draws a quad, Render1 uses draw_rle_sprite) In this case you can see that Allegro 5 is faster at drawing the small sprites but slower at scaling the back buffer to the screen as compared to the AllegroGL code I'm using in the old version. I'm going to look into the practicality of scaling the sprites directly to the screen rather than going through the additional render2 step. The players in my game are drawn by creating a temporary bitmap, copying the relevant head and body parts to the bitmap, and then drawing that to the screen – with additional steps if lighting and/or transparency are needed. In the Allegro 4 version they are also modified in real time using get/put pixel (primarily replacing the default player colour with the desired player colour, but also sometimes for a shield effect which puts a white outline around the player). I know this kind of pixel replacement is a non-starter for Allegro 5. Even without any lighting or transparency effects, in my test the Allegro 5 version averages 8ms to draw 16 players (49 draw sprite calls), while the Allegro 4 version runs about 20% faster. Given that the pixel replacement stuff is not feasible for Allegro 5, I will have to generate and store the sprites at the start of the level, so that should help performance. Both machines I've tested on are around six or seven years old and have Intel on-board graphics. Timing uses QueryPerformanceCounter. Something strange I noticed is that when I request a list of graphics modes in Allegro 5, all the modes are reported as having 23 bit colour depth. However, when I print the actual bitmap colour depths, they are all 32. |
dthompson
Member #5,749
April 2005
|
Just to be absolutely sure: what frame rates are you getting? Could any of this be vsync-related? I notice that all of the 'Render2' timings are over 16.6ms (some of them very close) which is close to the frame timing of a 60Hz display with vsync: 1000ms / 60Hz = 16.6ms. ______________________________________________________ |
Andrew Gillett
Member #15,868
January 2015
|
vsync is off. I just tested it on a much better PC and got these timings (window resolution was 1768x992): Render1: 6.0ms Render2: 3.9ms Frame: 9.9ms |
kenmasters1976
Member #8,794
July 2007
|
Andrew Gillett said: I'm going to look into the practicality of scaling the sprites directly to the screen rather than going through the additional render2 step In my recent Allegro 5 project I did set up an allegro transformation so that all drawing is automatically scaled to fit the screen/window size. It seems to work pretty good for 2D. Andrew Gillett said: I just tested it on a much better PC and got these timings Is the Allegro 4 timing still better?.
|
Andrew Gillett
Member #15,868
January 2015
|
Timings from a Paperspace cloud PC, which is faster than mine: al5 al4 EDIT: I originally wrote that the Al5 version varies a lot in terms of frame timings but I just realised that the Al4 version is showing smoothed out timings (as this text was originally written to the screen and is hard to read if the numbers are varying a lot every frame), whereas I changed this in the Al5 version. I tried scaling sprites directly to the screen rather than using an intermediate backbuffer. These are the timings I got on my desktop - a little better than before but not much. Render1: 8.6ms Render2: 13.9ms Frame: 22.6ms The problem is that the scaling ratio is not a whole number, so in this level I end up with a 1 pixel gap every 2 or 3 tiles. |
Edgar Reynaldo
Major Reynaldo
May 2007
|
Download my binaries and try ex_draw_bitmap after running RunA525Examples.bat from a command line. https://bitbucket.org/bugsquasher/unofficial-allegro-5-binaries/downloads/ My Website! | EAGLE GUI Library Demos | My Deviant Art Gallery | Spiraloid Preview | A4 FontMaker | Skyline! (Missile Defense) Eagle and Allegro 5 binaries | Older Allegro 4 and 5 binaries | Allegro 5 compile guide |
Andrew Gillett
Member #15,868
January 2015
|
I get 60fps +- 2, varies between around 1200 and 2500 / sec. With 1024 sprites, 29fps +- 0, 140-230/sec. |
Edgar Reynaldo
Major Reynaldo
May 2007
|
Ok, second question. Are you running on an integrated GPU or a dedicated card. What are your CPU specs and GPU model(s). Have you updated your drivers since purchasing the card? EDIT My Website! | EAGLE GUI Library Demos | My Deviant Art Gallery | Spiraloid Preview | A4 FontMaker | Skyline! (Missile Defense) Eagle and Allegro 5 binaries | Older Allegro 4 and 5 binaries | Allegro 5 compile guide |
Andrew Gillett
Member #15,868
January 2015
|
Intel Core i5-3570K with integrated GPU, latest drivers As mentioned previously, I put in some code to confirm that none of the bitmaps being drawn to are memory bitmaps. but... I just tried ALLEGRO_NO_PRESERVE_TEXTURE, having tried it a while back, and now I get vastly better Render2 performance. Render1: 8.5ms Render2: 0.5ms Frame: 9.0ms |
Edgar Reynaldo
Major Reynaldo
May 2007
|
Those numbers are much better. With the D3D driver, allegro automatically tries to back up textures, which is very slow at times. Try the OpenGL driver and see what your numbers are like. My Website! | EAGLE GUI Library Demos | My Deviant Art Gallery | Spiraloid Preview | A4 FontMaker | Skyline! (Missile Defense) Eagle and Allegro 5 binaries | Older Allegro 4 and 5 binaries | Allegro 5 compile guide |
|
1
2
|