Abstract— English is a language commonly used in the international world. Meanwhile, English that is often used in society is British English and American English. Beyond of the similarities, they all have fundamental differences, starting from the vocabulary to the grammar used. In learning English, people must ensure the type of English that they will learn. Therefore, this study is created a text classification system that can classify sentences according to the type of English used in the text. By that, it is expected to facilitate the language learning process in English. The dataset is divided into two classes namely British English and American English. The data will be divided by 10-fold-cross-validation. In this study, a combination of N-gram features, Term Frequency-Inverse Document Frequency (TF-IDF) weighting, and additional word dictionary as features were used. In the TF-IDF weighting process, a threshold of 2,0 in the Document-Frequency (DF) is given. The classification process is carried out using Support Vector Machine (SVM) algorithm with a linear kernel and the best accuracy obtained is 96.53%.
Keywords—text classification, support vector machine, British English, American English.