basic dx_ball project(like breakout)
yukcell

hi guys.
Our department give us a game project on c programming.this project includes bricks ,ball and bar(paddle).we have a problem about ball's bounce.The ball doesnt bounce from the bar(paddle).how can we make this?can you help us?Thanks everyone

Kris Asick

if (ball.y > PADDLE_Y) ball.y_dir = -ball.y_dir;

I'm going to be frank: If you can't figure something like this out then you aren't anywhere near ready to code a game. I recommend reading through some game programming tutorials before you go any further with your project.

Arthur Kalliokoski

@Kris: My guess was that it's getting stuck in the paddle, changing direction every time in the loop.

Dizzy Egg

Oh that's easy to do, just program it so it bounces off the paddle.

yukcell

it was a my first experience.we study on 1. grade so before we dont study on allegro.yeah we need change ball's direction off the paddle.Also we study on c programming.Can you write codes or algorithm?

Dizzy Egg

Yeah sure, just make the ball bounce off the paddle and that'll work, ok?

Good.

yukcell

hey Dizzy;
we couldnt make to ball bounce off the paddle.the ball stuck in paddle.ball couldnt move when it hits the paddle.Can you help me ?can you write code about this?

Kris Asick

yukcell, did you even read my response? I said you are NOT READY to work on this game of yours because you are asking questions that show you do not have the most basic skills necessary.

If you truly want to make this game, what you need to do is search for some online tutorials to learn the basics of game design and game programming. No one here is going to program your game for you; This forum is for sharing bits of code and asking why they're not working. If you have no code to show nor a specific coding problem to solve then we can't help you. :-/

Same goes for Compile Error in the following post if he happens to also be you or is working on your team: http://www.allegro.cc/forums/thread/611316

Audric

yukcell, as harsh as it can sound, it's true. Kris gave you the technical answer in the first line of the first response to your post. If you don't understand yet what that line of code meant, go see your teacher and ask him.

Specter Phoenix

Seem to be seeing this a lot lately on a lot of forums. People who have never coded a single application coming to sites wanting help to make Pong/Breakout. The one I still love to see is guys that are beginners at best claiming to be putting a team together to make a MMO-RPG/FPS/etc.

Dizzy Egg

They do make me smile...."I have to write a game, can you please show how"

Lolz.

Luiji99

If he doesn't recognize that first line of code then (s)he should take a programming tutorial before even touching on game anything.

Raidho36

Really now, game programming is an art. It requires extreme programming skills to make a decently running game, unless it's something as simple as mario or arkanoid. But even those two were assembly masterpieces back then when they came out, it's just modern hardware would let you waste it's processing power because it has vast of it.

Specter Phoenix

Programming is an art, game, application, or OS makes no difference. With the tools out there it really doesn't take that extreme skills to make a game. If you make a game maker game or torque engine game you are doing game programming using a scripted language that is designed to make game programming easier. Truth is that everyone uses libraries and APIs to make games and rarely make everything from scratch. Even allegro at times seems like it is just a wrapper for multiple cross platform APIs (libpng, libogg, physfs, etc). Game programming is hard if you are wanting to learn C/C++/C#/Java/D etc, but in the same breath it is easy too due to all the tools companies have brought out for allowing those interested in game design to make a functioning game with little effort.

Luiji99

I think that game programming would still be pretty easy if we just used OpenGL directly and wrote a simple FreeImage-based PNG-to-easierToLoadFormat converter setup. OpenGL is fairly easy, and easierToLoadFormat is ingenious.

Audio is where I might folly.

Arthur Kalliokoski

I'd say that when you get the OpenGL and the audio working, you're 25% done with a game. OTOH, if by "OpenGL working" you mean you've already finished level design, you might be 50% - 60% done.

Luiji99

You say "OpenGL working" in quotes as if I said that. This is weird.

My point is that programming the game isn't mostly setting up OpenGL. In fact, that has nothing to do with what I said. My point is that it isn't an massive setback to flunk out Allegro and libraries that are like it. Albeit, if you want something robust with support for PNG, JPEG, OGG, software-level virtual file systems, easy portability, etc., Allegro is definitely an important thing to use (unless you want to reinvent some serious wheels).

Raidho36

Well, since allegro is basically a wrapper with some extras, it doesn't have much to do with game programming itself.

The difference between game programming and generic programming is that in generic programming stability and precision always go first with robustness just a bit behind them, and in game programming performance is everything and nothing else is even slightly important. If you need a lot of extra execution code to fix some undesired behavior that would add considerable delays, in generic programming you just add that code, but in game programming you can really just throw it away and get to go with that bug, or come up with "slightly unstable" code that would be really short and won't impact performance, where "slightly" is very widely ranged. Say, in generic programming you wouldn't even consider using fast inverse square root, but in games that invention turned out to be incredibly handy. Of course modern hardware offers much faster and precise direct inverse square root computation, but the idea remains the same: anything goes for the sake of running fast. And "running fast" is an absolute value. If something runs fast on your I7 it doesn't mean it is actually fast. It could be as much as 0.5 FPS fast on p3. For such cases there's very common approach which states "screw them", but you can't really use it unless you create a monster like doom4 or diablo3 that would naturally require hell of a processing power.

By the way, I was looking into allegro's sources and it really lacks of game techniques exploitment. It would constantly call malloc/free in events queue for an instance, which is a fine way to program a software, but bad way to program a game. In video game, you allocate your memory once and then you use whatever you have. That's because continiously allocating and deallocating small chunks of memory creates a memory fragmentation, which is virtually unsolvable within mehtods reasonable for game programming, apart from being slow by itself. Another thing is probably brought in by plusesers. That thing is application of C++ patterns where they're don't belong, which only adds up confusion and slightly increases execution time and memory consumption for absolutely no reason other than just having them there.

here was a huge rant about C++ and coders that use it, but it is now absent because they're not susceptible to reasoning and it would cause nothing but holy war

Luiji99

I almost TL;DR'd but got interested. I'd say that Allegro is a bit generic in that you can write non-games with it, but fairly game-oriented in that the majority of non-games would have extra bulk and weird designs on it. I.e. a media player running on an event queue might not be as natural as with a game.

On the note of the event queue...it seems very un-optimum to do so many (de)allocations every frame. I wonder of there's a way to optimize this out (at least partially).

Raidho36

Easy: allocate as much as required to contain all the events that could possibly arrive between two discardtions. I believe, 4096 bytes events array would be satisfying for most cases. And if happens to be not, it could be easily re-allocated to 8192 bytes. That makes for a single malloc and free call and one or maybe two realloc calls, ever. That would also save up a bunch of memory compared to initial approach, because every single malloc creates 12 through 48 bytes overhead, depending on implementation.

Peter Wang
Raidho36 said:

It would constantly call malloc/free in events queue for an instance,

Where?

Raidho36

Let me just download sources again... there.

#SelectExpand
1void _al_event_queue_push_event(ALLEGRO_EVENT_QUEUE *queue, 2 const ALLEGRO_EVENT *orig_event) 3{ 4 ALLEGRO_EVENT *new_event; 5 ASSERT(queue); 6 ASSERT(orig_event); 7 8 _al_mutex_lock(&queue->mutex); 9 { 10 new_event = alloc_event(queue); 11 copy_event(new_event, orig_event); 12 ref_if_user_event(new_event); 13 14 /* Wake up threads that are waiting for an event to be placed in 15 * the queue. 16 */ 17 _al_cond_broadcast(&queue->cond); 18 } 19 _al_mutex_unlock(&queue->mutex); 20} 21 22static ALLEGRO_EVENT *alloc_event(ALLEGRO_EVENT_QUEUE *queue) 23{ 24 ALLEGRO_EVENT *event; 25 unsigned int adv_head; 26 27 adv_head = circ_array_next(&queue->events, queue->events_head); 28 if (adv_head == queue->events_tail) { 29 expand_events_array(queue); 30 adv_head = circ_array_next(&queue->events, queue->events_head); 31 } 32 33 event = _al_vector_ref(&queue->events, queue->events_head); 34 queue->events_head = adv_head; 35 return event; 36} 37 38static void expand_events_array(ALLEGRO_EVENT_QUEUE *queue) 39{ 40 /* The underlying vector grows by powers of two. */ 41 const size_t old_size = _al_vector_size(&queue->events); 42 const size_t new_size = old_size * 2; 43 unsigned int i; 44 45 for (i = old_size; i < new_size; i++) { 46 _al_vector_alloc_back(&queue->events); 47 } 48 49 /* Move wrapped-around elements at the start of the array to the back. */ 50 if (queue->events_head < queue->events_tail) { 51 for (i = 0; i < queue->events_head; i++) { 52 ALLEGRO_EVENT *old_ev = _al_vector_ref(&queue->events, i); 53 ALLEGRO_EVENT *new_ev = _al_vector_ref(&queue->events, old_size + i); 54 copy_event(new_ev, old_ev); 55 } 56 queue->events_head += old_size; 57 } 58} 59 60void* _al_vector_alloc_back(_AL_VECTOR *vec) 61{ 62 ASSERT(vec); 63 ASSERT(vec->_itemsize > 0); 64 { 65 if (vec->_items == NULL) { 66 ASSERT(vec->_size == 0); 67 ASSERT(vec->_unused == 0); 68 69 vec->_items = al_malloc(vec->_itemsize); 70 ASSERT(vec->_items); 71 if (!vec->_items) 72 return NULL; 73 74 vec->_unused = 1; 75 } 76 else if (vec->_unused == 0) { 77 char *new_items = al_realloc(vec->_items, 2 * vec->_size * vec->_itemsize); 78 ASSERT(new_items); 79 if (!new_items) 80 return NULL; 81 82 vec->_items = new_items; 83 vec->_unused = vec->_size; 84 } 85 86 vec->_size++; 87 vec->_unused--; 88 89 return ITEM_START(vec, vec->_size-1); 90 } 91}

Is there anything I missing?

Thomas Fjellstrom

It's a dynamic vector array, it only grows once in a while. In most use cases, it'll only realloc a few times. It's size will double every time it needs more room, after a few resizes, it'll never need to expand again (unless you let the queue backup and never drain).

Raidho36

Oh, that line...
if (vec->_items == NULL) {
And couple of ithers... I must've been reading the code diagonally when I thought it allocates space per event.

But what's with re-allocating to space two times bigger than necessary? It works okay if initial size is 1 unit, i.e. it would reallocate to 2 units, that makes perfect sense, but with 128 units it would reallocate to 256 units, while only giving 129 units data length i.e. just one more unit size increase to vectors' mind. It is certainly a memory waste. And this time I read the code line by line and then symbol by symbol so I'm fairly sure of it.

On the second thought, it supposed to double the vector size. But then the adjustment should be like this instead:

vec->_items = new_items;
vec->_unused = vec->_size;
vec->_size *= 2;

And if so, vector allocation is called old * 2 times, which grows the vector size in cubic progression, i.e. 0 -> 1 -> 2 -> 8 -> 1024 -> 67108864 -> 288230376151711744. It is enough to simply call it once to double it's size. But the thing that really confuses me here is that vectors' allocate back for some reason thinks that it uses up one unit. Absense of comments doesn't help at all.

Also, reallocation pointer is supposed to be void pointer, to match vector's _items field (and because *alloc returns void pointers).

Peter Wang
Raidho36 said:

But what's with re-allocating to space two times bigger than necessary?

It's a standard technique. Other expansion factors are possible; see, for example: http://en.wikipedia.org/wiki/Dynamic_array

Quote:

grows the vector size in cubic progression

Don't know what you're talking about.

Quote:

But the thing that really confuses me here is that vectors' allocate back for some reason thinks that it uses up one unit

It allocates a slot for the caller to fill.

Quote:

Also, reallocation pointer is supposed to be void pointer, to match vector's _items field

The _items field is declared as char *. We perform pointer arithmetic on it (see ITEM_START) which, in standard C, would return casting it to char * anyway.

By the way, the average length of an event queue is zero.

Raidho36

>It's a standard technique.
Yes I know that, what I meant is that it is actually unnecessary if the size is only increased by one unit not by factor two.
>Don't know what you're talking about.
Maybe I misused some term or even made up one. I meant that power of two is growing not by one each time, but by some larger value which grows every iteration, too. See this:

for (i = old_size; i < new_size; i++) {
    _al_vector_alloc_back(&queue->events);
}

And then this:
char *new_items = al_realloc(vec->_items, 2 * vec->_size * vec->_itemsize);
Assuming that vector is supposed to double the size, it makes for a multiple size doubling whenever new size exceeds 2. Say, for new size 4 allocation would be called 2 times, each one doubles the vector size, effectively increasing it's size 8 times, which would be 16 units. Next iteration calls allocation 8 times, 256 times size increase which makes 4096 units. And so on.
>It allocates a slot for the caller to fill.
Ok, but it does not increases this counter when allocates an event. Appending an array would increase it, but it isn't called anywhere. What's the use then?
>the average length of an event queue is zero
I believe that peak length is what matters.

Peter Wang
Raidho36 said:

Yes I know that, what I meant is that it is actually unnecessary if the size is only increased by one unit not by factor two.

You don't know ahead of time how many will be eventually required. Of course you can choose to grow the vector by only one each time, if you are trying to minimise space (never mind roundup applied by the memory allocator or increased chance of heap fragmentation).

Quote:

for (i = old_size; i < new_size; i++) {
    _al_vector_alloc_back(&queue->events);
}

This code is a bit silly, but all it does is grow the vector by (new_size - old_size) slots. I think I was lazy to add a separate function to allocate multiple at once. I doubt it would make any difference in practice.

Raidho36 said:

char *new_items = al_realloc(vec->_items, 2 * vec->_size * vec->_itemsize);

You missed the if (vec->_unused == 0) condition directly above it.

Raidho36 said:

Ok, but it does not increases this counter when allocates an event.

What counter? I think you're missing the fact that the event queue is implemented as a circular array on top of an underlying _AL_VECTOR.

Quote:

I believe that peak length is what matters.

The point is, event queues are typically very short. A single image or sound sample will generally take more space than any event queue.

Raidho36

>never mind roundup applied by the memory allocator or increased chance of heap fragmentation
Actually those two are avoided another way than reallocating to twice as large. It is only necessary to reallocate whatsoever if you want random access by offset, like, just calculate and dereference it. Instead, you could've returned an address where data is pushed in and don't do any calculations at all when fetching it, and when size is exceeded another data storage is created. Access by ID wouldn't be significantly slower than real random access since only requires an extra couple of very simple operations, and you don't use it much anyway because you got direct persistent pointers as explained above.
>event queues are typically very short
Obviously. Even a huge load of events won't exceed space required to store 32bpp 64x64 image. But that's not the point.
>You missed the if (vec->_unused == 0) condition directly above it.
Wait, I think I got it. Whenever unused slots are zero, vector size is doubled. Then unused is an inverse of array_size if it was there, and the size is to know how much is occupied. However, each allocation "increases" vector length by just one. So in events queue code to actually double it's size allocation is called old times. I think it makes some sense now, but really, what were you smoking when came up with such vague and awkward design? And couldn't an events queue be actually a queue? That would make more sense than emulating a queue through a vector. And it would probably work faster for being more specific. There could be even special queue events_queue_queue specifically to handle events fastest way possible. But okay, whatever. I would probably just suggest use of my data structures library once I finish it. It also could've be an addon, that would be handy allegro feature.

Peter Wang
Raidho36 said:

Actually those two are avoided another way than reallocating to twice as large. It is only necessary to reallocate whatsoever if you want random access by offset, like, just calculate and dereference it. Instead, you could've returned an address where data is pushed in and don't do any calculations at all when fetching it, and when size is exceeded another data storage is created. Access by ID wouldn't be significantly slower than real random access since only requires an extra couple of very simple operations, and you don't use it much anyway because you got direct persistent pointers as explained above.

I think you're thinking of a segmented array? That's way more complicated than we need and has a higher constant factor.

Quote:

But that's not the point

Then what is? You agree that the space is negligible, and by your initial (incorrect) assertion that memory is allocated per event, I take it you agree that event queues should be optimised for speed.

Quote:

And couldn't an events queue be actually a queue?

A queue is an abstract data type, which still needs to backed by a concrete data structure. A circular array is a common choice (another being a linked list, which we previously used). A generic dynamically-sized vector is nothing strange either. Maybe the specifics aren't to your taste but that's all.

Raidho36

That's way more complicated than we need and has a higher constant factor.

It is not any more complicated than vector. It won't provide random access, instead it would provide persistent pointer, which is faster to fetch cache miss or not. You can do this with regular arrays, too, but ability to maintain physical addresses on resize gives huge benefit. Unless you use such array not by it's purpose, of course, which is storing data when address persistence is critical. For queues you're not interested in locations, you only care for head and tail. Therefore you can even throw away any indexing whatsoever and only use head and tail pointers. I don't know for you, but I think it's simplier and more straightforward than current implementation of queue on top of vector.

Quote:

You agree that the space is negligible

It doesn't mean it could be simply wasted. Now I see it doesn't actually wasted, it's just there is kind of indistinct way of allocating the space for another event.

Quote:

initial (incorrect) assertion

Sorry about that, it's just a)for the first time I was reading the code like I read advertisment booklet on the elevator wall and b)there's no comments to explain such a knotted up method of completing the task. But yes, speed is important.

Quote:

A queue is an abstract data type, which still needs to backed by a concrete data structure.

A queue is not a data type for that matter. The only backing it needs is a block of memory.

Quote:

A circular array is a common choice (another being a linked list, which we previously used)

Linked lists are superflous for that task, and being flawy themselves. Circular vector is already better, but it could still be better than that. There could be features traded over performance. It could be optimized queue as I suggested before. It is not so much different from linked list actually. My vision of it is that list base structure would contain a couple of pointers and basic data about suggested sizes (or hard-coded in with type-specific optimizations); individual cells don't have next`/`previous (only actual data) pointers and are allocated in large groups, which do have a single next pointer that would point to next chunk, last of which would point to first, naturally looping them. It is not possible to perform insertions with such approach, but resizing goes super easy, overflow detection and resolving overhead is next to zero and does not require large contiguous space to work under extreme data load. I also thought of coding in function table that would resolve specific data sizes (1,2,4,8,16) naturally therefore faster, but I haven't tested it actually and since it's not a direct function call it cannot be really optimized, so it could be even slower that way than using memcpy for everything. But I honestly didn't thought so much on this one, it's just the best I came up with so far. But even as it is, it's already much more straightforward: just push in an event and then pop it (or peek or discard or whatever).

gnolam

FFS, learn to use <quote></quote> already. >:(

Raidho36

Formatting quick help says that I can use right angle brackets for quoting. It doesn't says though that following whitespace is mandatory, which is not in most other places.

Either way, I'm not pushing anything. I realize that even with large speed increase, something about 50% or bigger, it's not that significant for the game anyway so no reason to rewrite it unless there's no more important tasks to accomplish. I'm just doing some discussion.

Peter Wang
Raidho36 said:

It is not any more complicated than vector.

I'm not sure, but I think in your last paragraph you were describing a linked list where each list node contains multiple elements. (a "fat list", but Google does not attest to that terminology.) I can see how it would work, but I don't agree it's any simpler or would be any faster.

Quote:

A queue is not a data type for that matter.

Abstract data type.

Raidho36

I'm not sure, but I think in your last paragraph you were describing a linked list where each list node contains multiple elements.

A list that would store multiple elements in a single cell would have enormous computational problems to solve on random insertion and space waste on random deletion. That's not how well designed data structure works. Though yes, using it as a base it is easy to create a queue with many benefits and no performance loss.

Quote:

I don't agree it's any simpler or would be any faster

It will simplify and clearify events.c code as implementation would be within another unit. About speed - it actually will. In your "queue" you use reference function often. All it does is gives you a tail pointer, which you could've simply stored in data structure header. Please don't tell me that calculating a pointer by an offset multiplied by unit size is any faster than just keeping that pointer.

Also, about memcpy. It's really not needed for anything that has solid type, a simple macro with GNU extension will do perfectly. Something like this I guess:
#define push(q,v) (*((__typeof__(v)*)(q)->head)=(v),_push_void(q))
#define pop(q,v) (_pop_void(q),(v)=*((__typeof__(v)*)(q)->tail))

The linked list in misc folder doesn't actually contain any data (even though supposed to), only pointers, so for everything you push in you're supposed to preallocate some long living enough space. Am I get that right?

Another thing is bstrlib. All it does is virtually adds up fool-proofing on top of standard C string library. Oh, and lets you know string length in one operation, that's really handy. There's some other handy features, too, which I appreciate, but it also does a lot of work that not needed, which basically dictated by language the library is written in. Say, in C one would check data validity in advance of using it rather than hope that things would go all right on their own. In fact, C dictates such programming way that in a good program it is safe to assume that incoming data is always valid with exception to arbitrary external input. It also adds up inconsistency: some function accept ustr and others just char*. And even if using unicode, why it would be UTF-8 and not UTF-16/32 simply using wide characters? UTF-8 does saves up some space (50% ascii-only and about 20% most of the time), but introduces ridiculously huge processing overhead that cannot be surpassed by design for every single string operation except a couple of operations that affect entire string at once, like, not a part of it. Neither of UTF-16/32 suffer from this, they're just like regular C strings, only use wider characters. I mean seriously, if the header stores string length and you could've just fetch it, why would you even write that code below?

#SelectExpand
1size_t al_ustr_length(const ALLEGRO_USTR *us) 2{ 3 int pos = 0; 4 int c = 0; 5 6 while (al_ustr_next(us, &pos)) 7 c++; 8 9 return c; 10}

That seems very inconsiderate to me. Though having a built-in function to convert strings to UTF-8 to write them to UTF-8 encoded text files is nice.

Luiji99

Wouldn't cubic progression mean that the size would be cubed every expansion? It seems more like square progression if such a term exists.

Peter Wang

raidho36: Sorry, I really find your writing very unclear. If you want to discuss the data structure further I think you need to provide code.

A couple of things:

Raidho36 said:

Also, about memcpy. It's really not needed for anything that has solid type

Sure, as we do in copy_event. It's not that clear cut though, as modern compilers will specialise certain memcpy calls where the size argument is a constant. Or the other way, a structure copy will be implemented by a memcpy call.

Quote:

And even if using unicode, why it would be UTF-8 and not UTF-16/32 simply using wide characters?

Compatibility with C strings.

Quote:

introduces ridiculously huge processing overhead

I don't find it so. UTF-16 is variable width anyway, even if many programs don't handle it properly, and often you end up having to convert to/from UTF-8. UTF-32 is too wasteful.

(The Allegro UTF-8 routines are not very optimised so don't take that as the benchmark. It's possible to do better.)

Quote:

I mean seriously, if the header stores string length and you could've just fetch it, why would you even write that code below?

You need that for UTF-16 as well.

I find the ustr functions useful sometimes. If you don't like it, don't use it.

By the way:

Quote:

UTF-8 does saves up some space (50% ascii-only and about 20% most of the time)

ASCII really dominates. You'd think a page full of Chinese characters would be more efficiently represented in UTF-16 than UTF-8. Here's the numbers from http://tw.yahoo.com/index.html

290062 bytes (UTF-8)
574982 bytes (UTF-16)

Turns out the content of the page is tiny compared to the HTML markup and Javascript.

Raidho36

Sorry, I really find your writing very unclear.

If you speaking of me describing mentioned fat list problems, it was my misunderstanding of concept of using large chunk of memory to store nodes. Supposedly they're stored just like regular nodes, but instead of allocating space per-node, space is pre-allocated for multiple nodes, while my thought at the moment was a "list" that doesn't use next/previous pointers and stores data in a plain array fashion. Allocating more space at once is a most obvious optimization to first obvious implementation of linked list. With such approach you will need to index your memory cells usage though.

Quote:

modern compilers will specialise certain memcpy calls where the size argument is a constant

In your list, it is a variable. Just because a variable doesn't change, it doesn't makes it a constant. Keyword const doesn't forces anything to be constant either, just forbids direct value modification, and variables declared as constant don't count as actually constant. Not like you don't already know any of this. Anyway, in your code copying a variable into the list doesn't involve any unit size values explicitly. But really, for an abstract list you wouldn't write "copy this" and "copy that" functions for every single type you would store in, right? Though that is one of acceptable solutions. What's your thoughts on suggested macro? I mean apart from no thread safety (which is only a couple of extra functions) and obvious remark that it won't work with many compilers. It could be always extended with writing in explicitly a variable type into third argument to avoid using "`typeof`", which I find exceptionally handy. Though using expressions as a typeof argument tend to produce `int`s and `double`s, but when passing a variable it works perfectly.

Quote:

Compatibility with C strings.

All of unicode is backwards compatible with C strings (unless you plain copy C string into unicode string without doing a conversion of course, which is simply silly) and none of unicode is forward-compatible with them, but wide characters are constant-sized while UTF-8 is variable-sized, therefore prior are handled better. I mean yes, UTF-8 is a unicode text files encoding standard de-facto, but for internal data storing it's better to use already decoded strings. It's not even compatible with system unicode functions and needs to be decoded to constant-size encoding before use (AFAIK).

Quote:

UTF-16 is variable width anyway

Oh, now I remember. Yes, true. There's just a bit of confusion for internal unicode storage that would occupy 4 bytes on linux but only 2 bytes on windows. That gave me an (apparently wrong) idea that windows' internal unicode strings storage is equal to UTF-16 encoding. Now that is cleared up, I'd like to say that all that time I actually meant simply wide character strings which represent unicode characters. UTF encoding has nothing to do with it, except that 4-byte wide character string precisely matches UTF-32 encoded string. Sorry for confusing you.

Quote:

UTF-32 is too wasteful

Oh come on, using up up to extra 3 bytes of memory per character is wasteful and executing loads of code on every single string operation is not? Besides, just a few posts earlier you said that such sizes (similar to what a set of strings would take) are negilible in comparison to what a texture or a sound would take. You don't store a compressed PNG file into memory "as is" and decompress it each time to access it's pixel data just because a plain bitmap in native pixel format is "wasteful", right? Why even consider opposite approach with strings then? And the size difference isn't even that significant unless you encode, like, an entire english Wikipedia with it. Plus, 2 bytes is the most common size of UTF-8 character (when used for multicultural characters) anyway, which still would store several magnitudes (of two) smaller character range than wide character. That effectively equalizes 3-byte UTF-8 character to 2-byte wide character. Of course with linux, wide character would occupy 4 bytes no matter what, so wide character string would never consume less space than UTF-8 string in practice. Moreover, two highest bytes of wide character would probably never used. Still, wide character string offers to operate strings using something as simple as wcsncat and even memcpy, which is a perfectly good trade of memory over performance. You know, memory/processing rarity was extremely biased towards memory back in the times, but now it is completely opposite, so nothing is wrong with giving away some memory space that could've instead been saved in cost of lots of computations. Seriously, modern hardware offers 8 GB of ram as a common value, and several gigabytes of swap on top of that. Processing power is large, too, a magnitude of tens of gigahertz though. Still, there's always a meaningful use of it rather than constantly grinding the data over and over whenever it is necessary to perform any operation on it.

Quote:

I find the ustr functions useful sometimes.

I already mentioned there are useful and handy ustr functions. It's just the string themselves doesn't have to be encoded for internal storage. Using wide character strings for that is an optimal choice.

Quote:

ASCII really dominates
Turns out the content of the page is tiny compared to the HTML markup and Javascript.

That's exactly the reason why your argument is invalid: you display statistics for a web page source, not for a text that it actually contains. You could've as well suggested plain XML file to form such statistics and effeciency conclusions. And even if actually ise XML, it's nevertheless faster to modify it if it's represented in plain fashion rather than variable-width encoded, and it won't hurt anybody if data was internally stored as wide character string and only converted to UTF-8 just before flushing. In terms of data stream bandwidth decrease, you should instead pay attention to data compression rather than layout optimization. Prior typically achieves many times better compression for text files, especially those that repeat themselves most of the time, such as XML.

Most of UTF-8 handling code doesn't exploit UTF-8 handling techniques. That's odd to see right next to fairly smooth vector data structure implementation.

Once again, just discussion.

Peter Wang
Raidho36 said:

What's your thoughts on suggested macro?

I'm not sure where you want to use it, but it's probably unnecessary. Like I said, the compiler specialises memcpy with a constant size argument so you can write it portably and efficiently as memcpy(dst, src, sizeof(TYPE)); or memcpy(dst, src, sizeof(*src));

Quote:

All of unicode is backwards compatible with C strings

Specifically, we are talking about byte-oriented strings terminated by a NUL byte (no interior zeroes). You can pass pointers to UTF-8 encoded strings to functions expecting 8-bit strings and usually get something sensible, e.g. fprintf to write out strings to a file, strlen to count the size, strchr/strrchr to search for ASCII character or UTF-8 code units. And UTF-8 is fully general, so you if use that for everything you don't need to provide both 8- and 16-bit versions of functions with string arguments.

Quote:

It's not even compatible with system unicode functions and needs to be decoded to constant-size encoding before use (AFAIK).

Depends on the system. I guess you mean file system paths and the like.

The Unix world got Unicode support late and uses UTF-8 encoded strings pretty much everywhere. Microsoft and Java got Unicode earlier (defined it, I believe) when it was still a 16-bit code... until it wasn't, and they ended up with a variable-width encoding anyway. If they could, I wonder if they would choose UTF-16 again? Not sure.

And you generally don't need to decode to a constant size encoding before use.

Quote:

Oh come on, using up up to extra 3 bytes of memory per character is wasteful and executing loads of code on every single string operation is not? Besides, just a few posts earlier you said that such sizes

You come on. Strings get pretty big, and you have a lot of them. Just that single web page was 290k, and over 1MB in UTF-32. Whereas a typical Allegro program has a handful of event queues and a peak length of, what, four elements each?

Are there any systems that use UTF-32 exclusively? The only one that I know of that tried was Python3 (on Unix), which has now adopted flexible string representations.

Executing loads of code: not true. Quite often the algorithm is exactly the same as for ASCII, or you end up taking the ASCII fast path because all the interesting characters are in the ASCII range.

Quote:

That's exactly the reason why your argument is invalid: you display statistics for a web page source, not for a text that it actually contains.

Just an illustration. Strings in the real world are "marked up", and the markup is ASCII.

Anyway, that's it from me. You can find opinions for UTF-8 and UTF-16 both all over the net, but few supporters of UTF-32.

Raidho36

I'm not sure where you want to use it

Everywhere outside definition unit to push data into the queue. So you just pass to this macro your queue and your variable, and the macro stores the data into queue taking into account size of variable supplied in a natural fashion.

Quote:

memcpy(dst, src, sizeof(TYPE))
sizeof(*src)

The only problem is that variable type is a virtual entity and you can't store it or even compare it to anything, and that the functions substitute argument types, so this trick can only be used within such macro. It is moot that variable sized memcpy is slower than constant sized one. A little research shows that memcpy will most likely be as fast as direct assignment anyway, and probably even faster under certain circumstances.

Quote:

You come on.
suggests another XML example
Strings in the real world are "marked up"

No you come on with XML. I already told you that it's not a valid argument because game texts are not XML, nothing even close to markupped text with exception to some rare and specific cases. Whereever you've seen that over 90% of the game text data was the huge heap of system-related or any other keywords? I've seen that 90% of the game text data was a text to display "as is" which is the case with allegro, provided it doesn't support formatting anyway and doing it manually isn't what many do. I mean literally, on that web page you mentioned the meaningful text / formatting data relation was right next to zero. I suggest you to cut that out and be realistic about in-game string data. Another thing is that game is (typically) not a novel to contain any more than 512 kilobytes of text total. I'm not talkning about storing levels in text files, they aren't stored in the ram as a text any longer that is necessary to load a level. More likely there's just a single string of a file at any given moment. Neither entirely stored in memory for long time any other configuration files. They are being loaded and read (or write) and immediately discareded. Not like one-pass destinated text is even relevant on that topic, but still.

Well, even that. Let us assume that in some program I'd want to display 256 HTML pages simultaneously, like, in a 16x16 grid, and they're all animated and everything, so I can't just render them and discard into hard disc cache. And not just any page, but those that include loads of formatting that they would be something about 1-2MB with 4-byte wide character strings. That would take up something about 512MB. 512 vs ~130 is considerable, that's true. But then imagine that there's a script on that page that would modify DOM. Say, it would replace some footer DIV innerHTML. With UTF-8 you can't just jump there and do your thing. Instead, you grind through 250000 characters before you find the right place to even start doing it. Does the save of space worth perpetual slowness? Of course real browser stores data in a different fashion so there will be a tree of DOM elements where each one contains little to no text data at all, which is generated in a single pass anyway, so unavailability of random access isn't much relevant, but you get the idea. Again with sprites: you don't hack the video driver over waste of space because when you load 34x5 sprite you end up with 64x64 surface and because you only use 3 bits per texel but internally texture will have 32 bits per texel anyway? Сonsider how much space could've been saved on that, a lot, right? So best off with software rendering only; screw slowness and screw hardware acceleration, better off without it at all because it won't provide optimized enough data layout.

Once again, real game won't store this much data in the RAM, at any given moment most of the data will be on the hard drive and in there it could easily be both UTF-8 and compressed, volia, no problem. But in the game memory it's best to store data plain accessible. I don't mean UTF-32 or anything, I mean wide character strings.

Quote:

Quite often the algorithm is exactly the same as for ASCII

Implying operations that involve counting characters one by one. Appending a string to the end of another string does not involve such with C strings, wide or standard. Neither does copying a string section. Not to mention string comparison. Speaking of hardware acceleraiton, there is an assembly instructions for that, and I hardly doubt strcpy in stdlib is compiled with them.

Quote:

You can find opinions for UTF-8 and UTF-16 both all over the net, but few supporters of UTF-32.

I do not claim that there's something wrong with UTF-8 nor that UTF-32 is the single most epic awesome encoding. I say that internally keep an encoded data when you can store it unencoded for best performance is a bad practice and having wide character strings to store unicode strings is a subject for considering, which apparently have been thrown away without such.

Thread #611313. Printed from Allegro.cc