Improving Web Page Classification with two Novel Approaches on Semi Supervised Learning


Creative Commons License

Ünal H. E., Özel S. A.

İzmir International Conference on Technology and Social Sciences IICTSS 2022, İzmir, Türkiye, 17 - 19 Ağustos 2022, ss.25

  • Yayın Türü: Bildiri / Özet Bildiri
  • Basıldığı Şehir: İzmir
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.25
  • Çukurova Üniversitesi Adresli: Evet

Özet

The amount of information on the Web is increasing tremendously every second and most of this information is in unlabelled form. There is always need for  effective approaches to drive useful information from this extensive amount of unlabelled data. In our study, two novel semi supervised learning methods are proposed and the results of these methods are compared with the Co-Training and the Iterative Cross-Training methods from the literature. In the first proposed method (Incremental Parallel Training with Cross-Validation) the classifiers work in parallel and a validation rule is applied in order to enlarge the labelled set. On the other hand, in the second approach (Incremental Serial Training) three classifiers are combined and unlabeled examples are serially used to form a labeled set. The experiments are done on nine binary classification datasets which are publicly available WebKB, Banksearch, and the individually collected Conference datasets. Statistical analysis of the results is performed by using SPSS. According to these analyses it is observed that the performance of the two proposed methods are very high, especially the Incremental Parallel Training with Cross-Validation method has the highest classification performance among all methods.