Re: [Boost-users] [rfc] a library for gesture recognition, speech recognition, and synthesis

27 Oct 2009


      Stjepan Rajko wrote:
[...]
...
OK, I just completed a small experiment on the 9 texts of the Brown 
Corpus categorized as "humor". I used 6 of the texts for training, and 3 
for testing.
I created one submodel per tag 
(http://kh.aksis.uib.no/icame/manuals/brown/INDEX.HTM#bc6), trained each 
from the training data, and then connected the submodels into a larger 
model with transitions also trained by the training data.
Here are the results:
Out of 7159 tagged parts of speech (words, symbols, etc.) present in the 
3 test texts:
5190 were tagged correctly
300 were tagged incorrectly
1669 were not tagged, because the word or symbol was not present (at 
least not in a verbatim form) in the training data.
So, if you only consider the 7159-1669=5490 parts that could possibly be 
tagged based on what the training data covers, you get a 94.5% success rate.
By using a larger training set, the number of non-tagged parts should go 
down.  Also, I'm sure there are domain-specific tricks to improving the 
results.
BTW., 95% of work to get this done was putting together the code that 
reads the corpus, since I already have generic code that does this kind 
of experiment.
That is most impressive! I am looking forward to analysing the work you 
did and using it for German, too (which will be more complex, if I am 
not mistaken). Alas, as I wrote earlier, I have to patiently complete 
some other things before diving in :-)
...
Great!  I hope to have things cleaned up and better documented by then.
Thanks for your efforts! I really appreciate it!


Regards,

Roland

Re: [Boost-users] [rfc] a library for gesture recognition, speech recognition, and synthesis

Roland Bock