From classification to communication- using machine learning to distinguish bird sounds
Po-Han Lee1*, Shih-Kai Hong1, Tian-Rui Chung2, Te-Wei Chen3, Kuan-Lin Chen4, Yin-Kuo Wang5
1Affiliated Senior High School, National Taiwan Normal University, Taipei, Taiwan
2Computer Science and information Engineering, National Chung Cheng University, Chiayi, Taiwan
3Mathematical Science, National Changchi University, Taipei, Taiwan
4Taipei Municipal Jianguo High School, Taipei, Taiwan
5Center for General Education and Department of Physics, National Taiwan Normal University, Taipei, Taiwan
* Presenter:Po-Han Lee,
In order to prove that bird song has different sound characteristics in different physiological periods and different bird species, we have decided to use the Japanese white-eye datasets and the XC datasets, collected from, to solve the problems of distinguishing physiological period and bird species respectively. We have divided the Japanese white-eye data into seven different physiological periods and gotten the top 18 recorded species of B grade data from the XC website. The pre-processing is under the process of noise removal, cutting, and related conversion of Mel Spectrogram and MFCC to the graphs of Time-Frequency. The experimental setting is conducted as the process in which Convolution Module is assigned as the base, connected to the LSTM, GRU, Attention models, and the BERT-based model and we tried to combine Feed Forward Network as the ConvID model. The experimental results illustrate the best effect by assigning Conv1D and the BERT-based model, and the related accuracy of the two analysis can reach up to 97.7% and 78.9% (the macro F1-score). In addition to using the t-SNE dimension reduction and Visualized Attention Map, we are able to further prove that bird songs in different physiological periods and species have indeed exhibited different characteristics.

Keywords: Japanese white-eye, Machine Learning, MFCC, BERT