Philippe Khin
Add lyrics to MusixMatch using API: scraping lyric for songs without lyrics
26 December 2016
Leverage programming to contribute to the musical world!
python
selenium
webscraper
musixmatch
sql
api

Recently, I've installed the Musixmatch app on my smartphone, which is an amazing app allowing you to read the lyric of the song you're listening on your phone, and sync lyrics it in real-time. The Musixmatch website also provides a huge catalog of songs lyrics (like millions) in several languages. They also provide an API allowing developers to do research or create other apps using their database .

Since I'm a great fan of Japanese musics, having lyrics displayed while listening is so great to improve my Japanese. But Japanese are not so well known as American one, so many songs that I'm listening to doesn't have lyrics. So, I've played around a little bit with the API and I found that interesting to automatically add lyrics to songs without lyrics.

To do so, I started to search for chart or top artists, using the artist.chart.get API method, then check search for their songs, while checking whether or not they already have lyrics on Musixmatch. Once the song lacking lyric found, I use Selenium to automatically scrape them on J-Lyrics.net by inputting the artist name and the song name.

Here's the Python code :

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import requests, sys, os
import sqlite3


def scrapeFromJLyric(artist_name, song_name, lyric_site_url):
  print("Scraping song : '{}' by '{}'".format(song_name, artist_name))
  browser.get(lyric_site_url)
  browser.implicitly_wait(1)
  song_name_input   =  browser.find_element_by_name('kt')
  artist_name_input =  browser.find_element_by_name('ka')
  song_name_input.send_keys(song_name)
  artist_name_input.send_keys(artist_name)
  artist_name_input.submit()
  browser.implicitly_wait(1)
  try:
    browser.find_element_by_xpath(".//*[@id='lyricList']/div[2]/div[2]/a").send_keys(Keys.ENTER)
    browser.implicitly_wait(1)
    lyric_text = browser.find_element_by_xpath(".//*[@id='lyricBody']").text
    print ('Song found. Processing...\n')
    print(lyric_text)
  except:
    print ("---------- Song not found. ----------\n")

  return lyric_text


def addLyricToDBByArtistName(db_name, artist_name, api_key):
  api_method_name  = "track.search"
  api_url          = "http://api.musixmatch.com/ws/1.1/" + api_method_name + "?apikey=" + api_key
  conn = sqlite3.connect(db_name)
  c = conn.cursor()
  payload = {
        'format'      : 'json',
        'page'        : 1,
        'page_size'   : 100,
        'q_artist'    : artist_name,
  }
  res = requests.get(api_url, params=payload)
  for i in range(100):
    try:
      track_info = res.json()['message']['body']['track_list'][i]['track']
      if track_info['has_lyrics'] == 0:
        title    = track_info['track_name']
        artist   = track_info['artist_name']
        album    = track_info['album_name']
        track_id = track_info['track_id']
        lyric    = scrapeFromJLyric(artist_name, title, j_lyrics_url)

        c.execute("INSERT INTO songs (title, artist, album, track_id,
                                          lyric) VALUES (?, ?, ?, ?, ?)", (title, artist, album, track_id, lyric))
        conn.commit()
    except:
      pass

  conn.close()



def getChartArtist(country, api_key):
  chart_artist_list = []
  api_method_name   = 'chart.artists.get'
  api_url           = "http://api.musixmatch.com/ws/1.1/" + api_method_name + "?apikey=" + api_key
  payload = {
        'country'     : country,
        'page'        : 1,
        'page_size'   : 100,
        'format'      : 'json',
  }
  res = requests.get(api_url, params=payload)
  chart_artists_info = res.json()['message']['body']['artist_list']

  for i in range(100):
    chart_artist_list.append(chart_artists_info[i]['artist']['artist_name'])

  return chart_artist_list



if __name__ == '__main__':
  browser      = webdriver.PhantomJS(<PATH_TO_YOUR_PHANTOMJS.EXE>)
  api_key      = <YOUR_MUSIXMATCH_API_KEY>
  j_lyrics_url = 'http://j-lyric.net/'
  db_name      = 'songsJP.db'

  chart_artist_list =  getChartArtist('jp', api_key)
  for chart_artist in chart_artist_list:
    addLyricToDBByArtistName(db_name, chart_artist, api_key)

Here I used a Firefox Addon (Firebug) to get the xpath of an element on the J-Lyric HTML page, it's faster and more accurate than trying to write it by yourself.

I then store the lyrics scraped, in a SQLite Database. For better visualization, I found a GUI data browser  which allows you to see the content of your tables and browsing the data quickly and easily, without bothering yourself to make queries inside a ternimal window.

Here's the result of the scaping :

song lyric scraped musixmatch

And here's how the database looks like :

song lyric scraped musixmatch database

So now, I can contribute to Musixmatch by adding programatically lyrics to songs without lyrics !

EDIT: It seems that there's a problem with adding lyrics from Japan. I contacted the Musixmatch staff to solve this problem, so I cannot for the moment neither add lyrics using their API, nor adding manually via the website.

For the complete code, please check my GitHub repository.

© 2020, Philippe Khin