In the last year following search, I searched something about machine learning, like trying to detect SPAMs at my private projects. I saw something about KNN, random decision forests and naive Bayes.
Consequently, I wrote a C++ library to classify texts and some slides for a presentation, which you can view at the end of this blog post.


To optimize detection accuracy, I use DFA(deterministic finite automaton) to match patterns and put each mark in ranking. That ranking has one classification. You can view the following code here. To make your automaton, you can use Flex, bison in another way.
If you view a presentation on slide number 12, you can see my point of view about ranking to optimize the accuracy of the classifier at results.
SO, This is a very cool trick to gain accuracy. No more words, friends. Thank you for reading this!
Cheers!
References:
- Natural Language Processing by Dan Jurafsky, Christopher Manning
- John, G. H. e Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. Montreal, Quebec; Canada.
- Svore, K. M., Wu, Q., e Burges, C. J. (2007). Improving web spam classification using rank-time features. Banff, Alberta, Canada.
No comments:
Post a Comment