From classification to communication- using machine learning to distinguish bird sounds

Po-Han Lee^1*, Shih-Kai Hong¹, Tian-Rui Chung², Te-Wei Chen³, Kuan-Lin Chen⁴, Yin-Kuo Wang⁵

¹Affiliated Senior High School, National Taiwan Normal University, Taipei, Taiwan
²Computer Science and information Engineering, National Chung Cheng University, Chiayi, Taiwan
³Mathematical Science, National Changchi University, Taipei, Taiwan
⁴Taipei Municipal Jianguo High School, Taipei, Taiwan
⁵Center for General Education and Department of Physics, National Taiwan Normal University, Taipei, Taiwan

* Presenter:Po-Han Lee, email:leepohan@gmail.com

In order to prove that bird song has different sound characteristics in different physiological periods and different bird species, we have decided to use the Japanese white-eye datasets and the XC datasets, collected from xeno-canto.org, to solve the problems of distinguishing physiological period and bird species respectively. We have divided the Japanese white-eye data into seven different physiological periods and gotten the top 18 recorded species of B grade data from the XC website. The pre-processing is under the process of noise removal, cutting, and related conversion of Mel Spectrogram and MFCC to the graphs of Time-Frequency. The experimental setting is conducted as the process in which Convolution Module is assigned as the base, connected to the LSTM, GRU, Attention models, and the BERT-based model and we tried to combine Feed Forward Network as the ConvID model. The experimental results illustrate the best effect by assigning Conv1D and the BERT-based model, and the related accuracy of the two analysis can reach up to 97.7% and 78.9% (the macro F1-score). In addition to using the t-SNE dimension reduction and Visualized Attention Map, we are able to further prove that bird songs in different physiological periods and species have indeed exhibited different characteristics.

Keywords: Japanese white-eye, Machine Learning, MFCC, BERT