Migrating from livejournal to hexo
Migrating from livejournal to hexo
2018-04-21
·
2 min read
Migrating from livejournal to hexo
Migrating from livejornal posed some problems.
I want to note that I did not transfer the comments.
livejournal-export seemed to me the most suitable. It translates immediately to markdown. However, tags are not imported. It uses browser session parameters for identification.
As the second option I considered ljdump. It downloads the weblog in html. For identification uses login and password.
Then I had to convert to markdown. To do this, I wrote the script xml2md.
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import os
import sys
from lxml import etree
import html2text
from datetime import datetime
from transliterate import translit, get_available_language_codes
outdir = 'markdown'
os.makedirs(outdir, exist_ok=True)
infile = sys.argv[1]
print(infile)
tree = etree.parse(infile)
post = tree.getroot()
postTitle = ''
postTags = ''
for elem in post.getchildren():
if not elem.text:
text = "None"
else:
text = elem.text
# print(elem.tag + " => " + text)
# book_dict[elem.tag] = text
if elem.tag == 'subject':
postTitle = text
if elem.tag == 'eventtime':
postDate = text
if elem.tag == 'event':
postContent = text
if elem.tag == 'props':
for propsElem in elem.getchildren():
if not elem.text:
text = "None"
else:
text = propsElem.text
# print(propsElem.tag + " => " + text)
if propsElem.tag == 'taglist':
postTags = text
postHeader = 'layout: post\n'+'title: '+postTitle+'\n'+'date: '+str(postDate)
postHeader = postHeader+'\ncategories: blog\nlang: ru'
postHeader = postHeader+'\ntags: ['+postTags+']'
postHeader = postHeader+'\n'+'---'
h = html2text.HTML2Text()
h.ignore_links = False
h.body_width = 0
h.unicode_snob = True
postContent = h.handle(postContent)
fullPost = postHeader+'\n\n\n'+postContent
date = datetime.strptime(postDate, '%Y-%m-%d %H:%M:%S')
outFile = '{0.year}-{0.month:02d}-{0.day:02d}'.format(date)
if postTitle != '':
translitTitle = translit(postTitle, 'ru', reversed=True)
translitTitle = translitTitle.replace(" ","_")
translitTitle = translitTitle.replace(":","")
translitTitle = translitTitle.replace("'","")
translitTitle = translitTitle.replace("«","")
translitTitle = translitTitle.replace("»","")
translitTitle = translitTitle.replace(";","")
else:
translitTitle = '{0.hour:02d}-{0.minute:02d}-{0.second:02d}'.format(date)
outFile = outFile+'-'+translitTitle
f = open(outdir+'/'+str(outFile)+'.md', 'w')
f.write(fullPost)
f.close()
Conversion can be performed using the command:
./xml2md <file>
This creates the markdown directory and the conversion result is placed in it.
Authors
Professor of the Department of Probability Theory and Cybersecurity
My research interests include physics, Unix administration, and networking.