Maschinelles Lernen zur Erkennung von SMS-Spam


For more than 2 decades the popularity of text messages has been continuously increasing. At the same time the abuse of short message services (SMS) in terms of SMS-Spam is also increasing and leads to high costs and data security issues for mobile users as well as for service providers. In this working paper the concepts of machine learning are elaborated to identify automatically possible SMS-Spam and therefore improve the service quality of telecommunication companies. The recognition of SPAM is hereby based upon the classification of text messages with the help of the a) Bayes’ theorem by fragmenting the text message into n-grams and to determine the likelihood for SPAM. Deploying vertical n-grams is hereby a new approach. b) Multilayer perceptron (neural networks) and the Backpropagation learning algorithm. The ROC curve (receiver operation characteristic) and corresponding measures are used to compare and evaluate both methods.

In ild Schriftenreihe Logistikforschung, No. 35