Yi-Ning Tu, Associate Professor, Fu Jen Catholic University, Statistics and Information Science, Corresponding Author.
Peng-Hsuan Lee, Waseda university, Fundamental Science and Engineering (Japan).
ABSTRACT
The similarity calculation will highly impact the text mining model training results especially the well-known cosine similarity. There is a serious problem especially the length of the compared texts are highly imbalanced. In this kind of situation, the bag of words will be quite large and not easily have the same terms in the word vector matrix. Even if there are hits the terms also will be diluted by the large number of different texts between the two texts. The study proposed an algorithm and tried to solve the problems and give a case study between adapted anime and their original light novel. The new proposed similarity will replace the traditional cosine similarity to handle the case study. This study uses 32 online original light novels that are published after 2000 and 32 corresponding online adapted anime episodes’ summaries to find the key terms and calculate the similarity between. The result shows different genres of anime have different relationships of similarity and popularity. Besides, the proposed work also provides some strategies based on the analytic results.