Migrating from livejournal to hexo

2018-04-21 · 2 min read
blog sysadmin

Migrating from livejournal to hexo

Migrating from livejornal posed some problems.

I want to note that I did not transfer the comments.

livejournal-export seemed to me the most suitable. It translates immediately to markdown. However, tags are not imported. It uses browser session parameters for identification.

As the second option I considered ljdump. It downloads the weblog in html. For identification uses login and password.

Then I had to convert to markdown. To do this, I wrote the script xml2md.

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import os
import sys
from lxml import etree
import html2text
from datetime import datetime
from transliterate import translit, get_available_language_codes

outdir = 'markdown'
os.makedirs(outdir, exist_ok=True)

infile = sys.argv[1]
print(infile)

tree = etree.parse(infile)
post = tree.getroot()

postTitle = ''
postTags = ''

for elem in post.getchildren():
    if not elem.text:
        text = "None"
    else:
        text = elem.text
    # print(elem.tag + " => " + text)
    # book_dict[elem.tag] = text
    if elem.tag == 'subject':
        postTitle = text
    if elem.tag == 'eventtime':
        postDate = text
    if elem.tag == 'event':
        postContent = text
    if elem.tag == 'props':
        for propsElem in elem.getchildren():
            if not elem.text:
                text = "None"
            else:
                text = propsElem.text
            # print(propsElem.tag + " => " + text)
            if propsElem.tag == 'taglist':
                postTags = text
            



postHeader = 'layout: post\n'+'title: '+postTitle+'\n'+'date: '+str(postDate)
postHeader = postHeader+'\ncategories: blog\nlang: ru'
postHeader = postHeader+'\ntags: ['+postTags+']'
postHeader = postHeader+'\n'+'---'

h = html2text.HTML2Text()
h.ignore_links = False
h.body_width = 0
h.unicode_snob = True
postContent = h.handle(postContent)

fullPost = postHeader+'\n\n\n'+postContent

date = datetime.strptime(postDate, '%Y-%m-%d %H:%M:%S')
outFile = '{0.year}-{0.month:02d}-{0.day:02d}'.format(date)
if postTitle != '':
    translitTitle = translit(postTitle, 'ru', reversed=True)
    translitTitle = translitTitle.replace(" ","_")
    translitTitle = translitTitle.replace(":","")
    translitTitle = translitTitle.replace("'","")
    translitTitle = translitTitle.replace("«","")
    translitTitle = translitTitle.replace("»","")
    translitTitle = translitTitle.replace(";","")
else:
    translitTitle = '{0.hour:02d}-{0.minute:02d}-{0.second:02d}'.format(date)
outFile = outFile+'-'+translitTitle

f = open(outdir+'/'+str(outFile)+'.md', 'w')
f.write(fullPost)
f.close()

Conversion can be performed using the command:

./xml2md <file>

This creates the markdown directory and the conversion result is placed in it.

Dmitry S. Kulyabov
Authors
Professor of the Department of Probability Theory and Cybersecurity
Dmitry S. Kulyabov works as a professor at the Department of Probability Theory and Cybersecurity RUDN University. His research interests are in the field of theoretical physics and mathematical modeling.