Monday, August 1, 2016

Talking about text classifiers

In last year following search something about machine learning, like a try to detect SPAMs at my private projects. i saw something about KNN, random decision forests  and naive bayes.

Consequently, i wrote C++ library to classify texts, and some slides for apresentation, you can view at end of this blog post. 

So i choice  Naive Bayes because Naive Bayes is one of the simplest classifier which is based on Bayes theorem with strong and naïve independence assumptions. It is one of the most basic text classification techniques with various applications in email spam detection, document categorization, sexually explicit content detection, personal email sorting, language detection and sentiment detection(i think something like NLP). Despite the naïve design and oversimplified assumptions that this technique uses, Naive Bayes performs well in many complex real-world problems. Other good thing, Naive Bayes is good to limited resources in terms of CPU and Memory.

To optimize accuracy of detection i uses DFA(deterministic  finite automaton) is util to match patterns and put each pattern in ranking, that ranking have one classification. You can view the following code here. To make your automaton you can use Flex, bison other way that you like...

If you view apresentation, at slide number 12, you can see my point view about ranking to optimize accuracy of classifier at results.

SO, This is a very cool trick to gain accuracy. No more words friends. Thank you for reading this! 

Cheers !


No comments:

Post a Comment