TF-IDF Implementation with C++

TF-IDF weight is widely used in text mining. It measures the importances of a word to a document in corpus. Recently I was doing with music recommendation algirhtms, and I have found that many papers were using the TF-IDF to measure the lyric similarity between musics. I have searched and did not find a TF-IDF library, so I decided to code one by myself.

Read More

Scrape Last.fm using Ruby

Recently I was doing a project on analyzing music recommendation algorithms. I have found that one of the popular free music dataset is Last.fm Dataset. However, this dataset only includes users’ recent played musics. Normally, users’ rating histries are important for recommendation algorithms. The play histries in Last.fm Dataset only represent users’ implicit ratings, which means the ratings can be inferred from play times, skip or not, etc. One the other side, the explicit ratings are not avaliable in this dataset, which are users’ explicit activities such as mark a song as loved or banned. After some study, I decide to scrape the data by myself, using Last.fm API.

Read More