Migrating from livejournal to hexo

2018-04-21 · 2 min read

Migrating from livejournal to hexo

Migrating from livejornal posed some problems.

I want to note that I did not transfer the comments.

livejournal-export seemed to me the most suitable. It translates immediately to markdown. However, tags are not imported. It uses browser session parameters for identification.

As the second option I considered ljdump. It downloads the weblog in html. For identification uses login and password.

Then I had to convert to markdown. To do this, I wrote the script xml2md.

 1#!/usr/bin/python3
 2# -*- coding: utf-8 -*-
 3
 4import os
 5import sys
 6from lxml import etree
 7import html2text
 8from datetime import datetime
 9from transliterate import translit, get_available_language_codes
10
11outdir = 'markdown'
12os.makedirs(outdir, exist_ok=True)
13
14infile = sys.argv[1]
15print(infile)
16
17tree = etree.parse(infile)
18post = tree.getroot()
19
20postTitle = ''
21postTags = ''
22
23for elem in post.getchildren():
24    if not elem.text:
25        text = "None"
26    else:
27        text = elem.text
28    # print(elem.tag + " => " + text)
29    # book_dict[elem.tag] = text
30    if elem.tag == 'subject':
31        postTitle = text
32    if elem.tag == 'eventtime':
33        postDate = text
34    if elem.tag == 'event':
35        postContent = text
36    if elem.tag == 'props':
37        for propsElem in elem.getchildren():
38            if not elem.text:
39                text = "None"
40            else:
41                text = propsElem.text
42            # print(propsElem.tag + " => " + text)
43            if propsElem.tag == 'taglist':
44                postTags = text
45            
46
47
48
49postHeader = 'layout: post\n'+'title: '+postTitle+'\n'+'date: '+str(postDate)
50postHeader = postHeader+'\ncategories: blog\nlang: ru'
51postHeader = postHeader+'\ntags: ['+postTags+']'
52postHeader = postHeader+'\n'+'---'
53
54h = html2text.HTML2Text()
55h.ignore_links = False
56h.body_width = 0
57h.unicode_snob = True
58postContent = h.handle(postContent)
59
60fullPost = postHeader+'\n\n\n'+postContent
61
62date = datetime.strptime(postDate, '%Y-%m-%d %H:%M:%S')
63outFile = '{0.year}-{0.month:02d}-{0.day:02d}'.format(date)
64if postTitle != '':
65    translitTitle = translit(postTitle, 'ru', reversed=True)
66    translitTitle = translitTitle.replace(" ","_")
67    translitTitle = translitTitle.replace(":","")
68    translitTitle = translitTitle.replace("'","")
69    translitTitle = translitTitle.replace("«","")
70    translitTitle = translitTitle.replace("»","")
71    translitTitle = translitTitle.replace(";","")
72else:
73    translitTitle = '{0.hour:02d}-{0.minute:02d}-{0.second:02d}'.format(date)
74outFile = outFile+'-'+translitTitle
75
76f = open(outdir+'/'+str(outFile)+'.md', 'w')
77f.write(fullPost)
78f.close()

Conversion can be performed using the command:

1./xml2md <file>

This creates the markdown directory and the conversion result is placed in it.

Dmitry S. Kulyabov
Authors
Professor of the Department of Probability Theory and Cybersecurity
My research interests include physics, Unix administration, and networking.