|
Sound recording basics |
clarkd
Member #7,931
November 2006
|
Hey all! After reading through documentation and the source for Allegro I have come to the conclusion that I am still a newbie, and I need some help figuring out how I can use Allegro's sound capabilities. I am trying to create a program that can take two sound samples and compare them to determine if they are a close enough match. Now for the most part I have discovered that sample plays a crucial role, but I am not completely sure how it works. I see that the sound data is stored in memory with the help of pointers to direct it, but I was looking at the example posted by spellcaster and noticed that the pointer is basically incremented by the length of the buffer. Now if that is the case what if part of the buffer is not filled and the pointer is just incremented that full size of the buffer. Won't it leave some empty space with silence? Furthermore, I see that pointer is assigned to a unsigned char pointer. Does this mean that I can access any byte of the data by just checking the content of that address? I am throughly confused as how to work the buffer and/or memory, so that I can take certain parts of it and compare it to another set of data. Can someone enlighten me as to how the raw data is stored in allegro Thanks, Dan |
miran
Member #2,407
June 2002
|
To record sound data you do this: 1. Create a SAMPLE that will be large enough to hold the amount of sound data you want to record. Do this with create_sample(). 2. Call start_sound_input() to find out how much data is recorded at a time. You won't get the entire sample filled at once, but instead buffer by buffer, so you have to piece it together yourself. 3. Make a pointer that points to the sample's data member. 4. In a loop repeatedly call read_sound_input() giving it the pointer from #3 as input argument. Each time the call to read_sound_input() was successful, increment the pointer by the amount start_sound_input() returned. Continue this until all the sample data has been filled. Make sure you don't go past the sample's data buffer size, otherwise your program will crash. Quote: I am trying to create a program that can take two sound samples and compare them to determine if they are a close enough match. You will want to do this in the frequency domain. The sample's data is stored in the time domain. You will transform it to frequency domain with Fourier transform. You will probably want to use an algorithm such as Fast Fourier Transform (FFT) for this. Once you've done that for both samples, you can compare them by a number of means. You can calculate the euclidian distance, or you can search for the location of peaks or something like that. If the difference between two frequency footprints is small enough, then you can say the samples match. -- |
Tobias Dammers
Member #2,604
August 2002
|
Depends on what it is you need to compare. If you're interested in the samples' pitch, then auto-correlating each one and comparing the fundamental pitches may be a better method. For speech recognition, multiple band-pass filters (the analyzer part of a vocoder) may give good results. Otherwise, I'd suggest FFT (with a proper window function). --- |
clarkd
Member #7,931
November 2006
|
Ok, so now I am wondering, because I have a snippet of code that displays the contents of the memory, how do the values correlate to the sounds. Anyway, look at the code and see if I am correct in my method on checking the data
Just to explain what I think I am looking at, the read_sound_input(buf) takes the buffer, all 88200 bytes in this case, and puts each byte into a memory location. Therefore, each piece of sound data is represented by numerical value from 0 to 255 since that is the range of a value that can be stored in a byte. Also it appears to be a linked list, so that data is stored in a linear progression. Now I am curious as to how to interpret this because it looks like the interval is allowing for the buffer to be emptied into the sample every 900ms. Which means that we are storing roughly 98,000 bytes each second, but if I look at the data that streams out of the memory it looks random and doesn't seem to correlate to the sound. For example, if I take out the microphone I get data like the following
//edit: I noticed that I get the same values when I run it multiple times. I wonder if this means anything? This makes me wonder if each byte of the sample memory is actually a member of a group of bytes that form a piece of sound data instead of just one byte forming a piece of sound data. Additionally, I know my sound card is probably junk, but I would think that the values that were to be stored would be closer to 0 since the microphone was unplugged. |
miran
Member #2,407
June 2002
|
Quote: how do the values correlate to the sounds Depends on whether you record in 8 or 16 bits and stereo or mono. Your code snippet wrong. You should read the manual to find the meaning of the values returned by the get_sound_input_cap_xxx() functions. When you call create sample, you should pass it parameters that you want to and at the same time know your hardware supports (by looking at the previously mentioned values). Then pass the same parameters to start_sound_input(). For example if you want to record 4 seconds of sound in 8bit mono at 22kHz, you use the get_sound_input_cap_xxx() functions to see if that is supported on your hardware and if it is, pass 8, 0, 22050 and 4*22050 to the create_sample() function. Then pass 22050, 8 and 0 to start_sound_input(). Then when you read sound input, you will get 22050 8bit values (bytes) for each second of recorded sound. I'm not sure whether allegro uses signed or unsigned format, you should check with the documentation or source, it has to be documented somewhere. If you record in 16bit, then you will get 44100 bytes for each second, that is 22050 16bit values. And if you record in stereo, you will get twice as many values. Again I'm not sure, but I think in stereo Allegro uses the scheme that interlaces left and right channels. That is one value for left, one for right, one for left, one for right and so on. -- |
clarkd
Member #7,931
November 2006
|
Quote: You should read the manual to find the meaning of the values returned by the get_sound_input_cap_xxx() functions. Well I went through that part of the code and cleaned it up so that it should be correct, but I know that the setup shouldn't have been a problem because it was playing the samples back just fine. However, I discovered that I cannot record at 8 bits. When I tried get_sound_input_cap_rate(8, stereocap); it was returning a zero, so I am assuming my sound card on my laptop is crap. Regardless, this is the updated part of the code that checks my hardware, but I am absolutely sure that it is correct. Then on top of this I am using the ratecap, bitcap, and stereocap as parameters in create_sample
So I don't see how this can be wrong, however, I think I am going to put zero back in to simplify the data. |
miran
Member #2,407
June 2002
|
Quote: Then on top of this I am using the ratecap, bitcap, and stereocap as parameters in create_sample According to the manual if your hardware supports both 8bit and 16bit, bitcap will be 24 (that is 8 & 16). Don't use the value of bitcap, use either 8 or 16, depending on what you're interested in. -- |
clarkd
Member #7,931
November 2006
|
GOT IT!!!...sorta I should have realized this. A char is only a byte long so it doesn't fit the data correctly for 16-bit sound which I am using. Therefore, I used unsigned short as the variable that I am using to store and and output the sound data, but I was getting around 32768 as the output. Now 2^15 equals 32768, so this means that the 16th bit is the sign bit. Nice theory right? When I tried to use signed short I got numbers around -32768, so it sounds like it is including the 16th bit in the value of the number. However, I have to go to work, so I'll post my results later, but I think I am on the right path to understanding how the data is stored. Thanks miran for pounding the simple concepts into my head. Now I just need to find a way to compare the data. Oh and the best part is when I increment the pointed it automatically skips the next byte. |
|