Migrating from livejournal to hexo

2018-04-21 · 2 min read

Migrating from livejournal to hexo

Migrating from livejornal posed some problems.

I want to note that I did not transfer the comments.

livejournal-export seemed to me the most suitable. It translates immediately to markdown. However, tags are not imported. It uses browser session parameters for identification.

As the second option I considered ljdump. It downloads the weblog in html. For identification uses login and password.

Then I had to convert to markdown. To do this, I wrote the script xml2md.

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import os
import sys
from lxml import etree
import html2text
from datetime import datetime
from transliterate import translit, get_available_language_codes

outdir = 'markdown'
os.makedirs(outdir, exist_ok=True)

infile = sys.argv[1]
print(infile)

tree = etree.parse(infile)
post = tree.getroot()

postTitle = ''
postTags = ''

for elem in post.getchildren():
    if not elem.text:
        text = "None"
    else:
        text = elem.text
    # print(elem.tag + " => " + text)
    # book_dict[elem.tag] = text
    if elem.tag == 'subject':
        postTitle = text
    if elem.tag == 'eventtime':
        postDate = text
    if elem.tag == 'event':
        postContent = text
    if elem.tag == 'props':
        for propsElem in elem.getchildren():
            if not elem.text:
                text = "None"
            else:
                text = propsElem.text
            # print(propsElem.tag + " => " + text)
            if propsElem.tag == 'taglist':
                postTags = text
            



postHeader = 'layout: post\n'+'title: '+postTitle+'\n'+'date: '+str(postDate)
postHeader = postHeader+'\ncategories: blog\nlang: ru'
postHeader = postHeader+'\ntags: ['+postTags+']'
postHeader = postHeader+'\n'+'---'

h = html2text.HTML2Text()
h.ignore_links = False
h.body_width = 0
h.unicode_snob = True
postContent = h.handle(postContent)

fullPost = postHeader+'\n\n\n'+postContent

date = datetime.strptime(postDate, '%Y-%m-%d %H:%M:%S')
outFile = '{0.year}-{0.month:02d}-{0.day:02d}'.format(date)
if postTitle != '':
    translitTitle = translit(postTitle, 'ru', reversed=True)
    translitTitle = translitTitle.replace(" ","_")
    translitTitle = translitTitle.replace(":","")
    translitTitle = translitTitle.replace("'","")
    translitTitle = translitTitle.replace("«","")
    translitTitle = translitTitle.replace("»","")
    translitTitle = translitTitle.replace(";","")
else:
    translitTitle = '{0.hour:02d}-{0.minute:02d}-{0.second:02d}'.format(date)
outFile = outFile+'-'+translitTitle

f = open(outdir+'/'+str(outFile)+'.md', 'w')
f.write(fullPost)
f.close()

Conversion can be performed using the command:

./xml2md <file>

This creates the markdown directory and the conversion result is placed in it.

Dmitry S. Kulyabov
Authors
Professor of the Department of Probability Theory and Cybersecurity
My research interests include physics, Unix administration, and networking.