I always wanted to be able to save my Steem posts locally. After that better searching tools are available than the ones we have at the blockchain level.
I have only started poking around the development APIs for Steem, and this is the first script with a real purpose I've done in Python. On top of that, I'm also kind of new to Ubuntu. :)
If you are a dev and have been doing this for a while, you probably can write a more efficient script.
I wasn't looking for efficiency when I wrote it, I was interested to learn, and from there maybe others who are also Python beginners or haven't tried to code using Steem APIs. Hence the extensive comments.
Features and options:
saves all your markdown posts as .md files
saves all your raw HTML posts as .html files
you can set a main sub-directory or sub-path in the current directory where the files will be placed
posts will be placed in subdirectories based on the creation date (year-month) or primary tag - option to set at the beginning of the script
you can save the posts for any account
you can save resteemed posts as well or not
you can add tags at the end of the post or not
title is automatically added as H1 at the beginning of the post
I've tested the script on Python 3.7.4, but I believe it should work on earlier versions. Also the script is written for Linux/Ubuntu, for Windows you will need to adapt the parts of the script handling paths and creation of directories.
You will also need a good Markdown viewer/editor to see the saved files. I used Typora, but it looks like this will be a paid software when it exits beta version, so a good free alternative will be nice.
So, here's the Python script. Pay attention, settings are hard coded, you'll have to manually change them.
While I'm far from a Python or Steem dev expert, if you have questions let me know.
Feedback to improve from more experienced devs is welcomed as well. :)
import osimport sysimport jsonfrom steem import Steems = Steem()# script parameters# =================# authorauthor_name = 'testuser123'# relative directory under which the posts will be saved (don't add a final "/"!)main_save_dir = 'steem-posts-' + author_name# structure of directories under which posts will be saved# Options:# primary-tag - posts are saved under their primary tag subdirectory# year-month - posts are saved under the year-month of their creation date subdirectorydir_struct_option = 'year-month'print('Save posts by ' + dir_struct_option)# bool flag to determine if tags are added at the end of the post or notadding_tags_to_saved_post = Trueprint('Adding tags to the end of each post? ' + str(adding_tags_to_saved_post))# bool flag to determine if to save resteemed posts of other authors as wellinclude_resteem_posts = Falseprint('Include resteemed posts? ' + str(include_resteem_posts))# =====================# end script parameters##create main save directory (as a subdirectory or sub-path of the current directory)try: os.makedirs(main_save_dir) print('Directory ' + main_save_dir + ' created in current directory ' + os.curdir)except FileExistsError: print('Directory ' + main_save_dir + ' already exists in current directory ' + os.curdir)except OSError: print('Directory ' + main_save_dir + ' couldn\'t be created in current directory ' + os.curdir)#save current dircur_dir_saved = os.curdir# loops through all the posts of the given author# we break out of the loop after we reach the last post of the authori = 1while True: #retrieve current blog post info #theoretically we can retreieve more than one blog per call, in my tests anything more than 2 generated an error, so I prefered to take them one by one try: blogs = s.get_blog(author_name, i, 1) except Exception: print('Couldn\'t get blog #' + str(i) + '. Trying again. Ctrl+C to interrupt.') continue #is it empty? then we reached the end and we should break out of the loop if blogs == []: break #is it the author's post or a resteem? #if it's a resteem continue from the next iteration and resteems are not to be included if blogs[0]['comment']['author'] != author_name: if not include_resteem_posts: print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Skipping it.') i += 1 continue else: print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Including it.') #choose the name of the subdir where to place the saved posts #(i.e. posts can be saved by primary-tag or date [year-month]) if dir_struct_option == 'primary-tag': subdir_name = 'tags/' + blogs[0]['comment']['category'] elif dir_struct_option == 'year-month': subdir_name = 'date/' + blogs[0]['comment']['created'][0:7] #attempt to create the subdir first if cur_dir_saved == '.': dir_name = main_save_dir + '/' + subdir_name elif cur_dir_saved == '/': dir_name = cur_dir_saved + main_save_dir + '/' + subdir_name else: dir_name = cur_dir_saved + '/' + main_save_dir + '/' + subdir_name #create the subdirectory/ies where we will place our files try: os.makedirs(dir_name) print('Directory ' + dir_name + ' created.') except FileExistsError: pass except OSError: print('Directory ' + dir_name + ' couldn\'t be created.') raise OSError #deserialize json_metadata json_metadata_str = blogs[0]['comment']['json_metadata'] json_metadata_dict = json.loads(json_metadata_str) try: format = json_metadata_dict['format'] except KeyError: print('Broken blog json before format key. Defaulting to "markdown+html".') format = 'markdown+html' #is the post markdown? if format == 'markdown+html' or format == 'markdown': #choose the filename as the blog post's permlink + ".md" extension filename = blogs[0]['comment']['permlink'] + '.md' if (adding_tags_to_saved_post): #get tags and create a string with them to add at the end of the post try: tags_str = '\n\n' for x in json_metadata_dict['tags']: tags_str += '#' + x + ' ' except KeyError: tags_str = '' else: tags_str = '' #get post body body = blogs[0]['comment']['body'] #get post title title = blogs[0]['comment']['title'] #format the body to also include title at the begining as H1 and tags (with #) at the end body_with_title_and_tags = '# ' + title + '\n\n' + body + tags_str #or is the post raw html? else: #choose the filename as the blog post's permlink + ".md" extension filename = blogs[0]['comment']['permlink'] + '.html' if (adding_tags_to_saved_post): #get tags and create a string with them to add at the end of the post try: tags_str = '\n\n' for x in json_metadata_dict['tags']: tags_str += '<a id="' + x + '" href="#' + x + '">' + x + '</a> ' except KeyError: tags_str = '' else: tags_str = '' #get post body body = blogs[0]['comment']['body'] #get post title title = blogs[0]['comment']['title'] #format the body to also include title at the begining as H1 and tags (with #) at the end body_with_title_and_tags = '<h1>' + title + '</h1>\n\n' + body + tags_str #write post to file (overwrite if exists) try: f = open(dir_name + '/' + filename, 'w') f.write(body_with_title_and_tags) f.close() print('Post #' + str(i) + ': ' + dir_name + '/' + filename + ' successfully saved.') except OSError: print('Something went wrong while attempting to write file ' + dir_name + '/' + filename) raise OSError i+=1print('No (more) posts.')
Update: Edited the post because in the original there were some errors due to the copy-pasted code to html, which I haven't initially tested.