Alright, I did some benchmarks on the pixel operations:
{"name":"603154","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/3\/3\/33e3cfda668c7e6ba202ab298b3a24a2.png","w":624,"h":519,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/3\/3\/33e3cfda668c7e6ba202ab298b3a24a2"}
Much to my surprise, the al_put_pixel() and al_get_pixel() functions are quite a bit faster than I imagined they would be. I had thought writing a direct memory write function on a locked region would be faster.
Basically, under no circumstances would you want to use them without first locking your bitmap, as you can see from the chart. As remarkably fast as the functions are when locked, they're equally remarkably slow when not. al_draw_pixel()is reasonable as long as you don't want to draw a lot of pixels.
The locking function itself can be kind of slow, however. On my system, I could probably lock about 30 320x240 bitmaps in their native pixel format and that alone would affect the 60fps frame rate without anything else happening.
Also, there were no noticeable differences between ALLEGRO_LOCK_READONLY, ALLEGRO_LOCK_WRITEONLY, and ALLEGRO_LOCK_READWRITE.
write_pixel_argb_8888() is as follows:
inline void write_pixel_argb_8888(ALLEGRO_LOCKED_REGION *region, int x, int y, ALLEGRO_COLOR &col) { uint32_t *ptr32; unsigned char r, g, b, a; al_unmap_rgba(col, &r, &g, &b, &a); ptr32 = (uint32_t *)region->data + x + y*(region->pitch/4); *ptr32 = (a << 24) | (r << 16) | (g << 8) | b; }
The benchmarks might be more useful if we could see how many seconds per call they each took. Use al_get_time before and after a set number of calls to each function (less calls for the slower functions). Then you can get calls/second through simple division. It would give an idea how long a custom blit function would take.
With the method I was using, I recorded the operations with al_get_time() and incremented the number of loops (num_color_placements) to max out the bar to to the 60fps-limit line.
start_profile_timer("lock"); if (lock_bitmap) region = al_lock_bitmap(image, ALLEGRO_PIXEL_FORMAT_ARGB_8888, lock_flags); stop_profile_timer("lock"); start_profile_timer("function"); for (int i=0; i<num_color_placements; i++) { write_pixel_argb_8888(region, 30, 30, base_color); //al_put_pixel(30, 30, base_color); //al_draw_pixel(30, 30, base_color); //al_get_pixel(image, 30, 30); } stop_profile_timer("function");
{"name":"603157","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/9\/99b3293f4d29ccc4ffd72ff88d4884e9.png","w":325,"h":394,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/9\/99b3293f4d29ccc4ffd72ff88d4884e9"}
If it's right on the that light blue line, then then all the calls take exactly 1 frame of 60fps, or 1/60 of a second. The actual numbers jump around a bit, so it's difficult to get an exact seconds-per-call. The numbers on the right are seconds*10000.
Did you try reading directly from the lock's data instead of using al_get_pixel()?
Here's an example ML showed to me recently. If you're in the benchmarking mood, I'm curious how faster it is.
Reading and writing directly from the lock buffer is only useful in certain circumstances. The problem is, you have to lock in the same format as the bitmap data is in (ALLEGRO_PIXEL_FORMAT_ANY), because if you don't, Allegro will convert to the requested format and then back when you unlock which nullifies the reason for manually reading/writing in the first place (speed). You could theoretically support multiple pixel formats, but then you're going to be writing a lot of code.
However, if you KNOW the format of a bitmap is something specific and always will be, writing a single code path to process the locked region directly is going to be faster than put/get pixel. Probably not by enough to make it worth it in most circumstances, but it definitely has its uses in time critical code.
y*(region->pitch/4);
I'm not sure if that's guaranteed to work. If pitch is not a multiple of four, then you'll lose the remainder.