Comparative evaluation of string similarity measures for automatic language classification.