Go4Hive

Python Script to Save All Your Posts As MarkDown or Html Files

BY: @gadrian | CREATED: March 9, 2020, 6:24 p.m. | VOTES: 147 | PAYOUT: $7.20 | [ VOTE ]

I always wanted to be able to save my Steem posts locally. After that better searching tools are available than the ones we have at the blockchain level.

I have only started poking around the development APIs for Steem, and this is the first script with a real purpose I've done in Python. On top of that, I'm also kind of new to Ubuntu. :)

If you are a dev and have been doing this for a while, you probably can write a more efficient script.

I wasn't looking for efficiency when I wrote it, I was interested to learn, and from there maybe others who are also Python beginners or haven't tried to code using Steem APIs. Hence the extensive comments.

Features and options:

saves all your markdown posts as .md files
saves all your raw HTML posts as .html files
you can set a main sub-directory or sub-path in the current directory where the files will be placed
posts will be placed in subdirectories based on the creation date (year-month) or primary tag - option to set at the beginning of the script
you can save the posts for any account
you can save resteemed posts as well or not
you can add tags at the end of the post or not
title is automatically added as H1 at the beginning of the post

I've tested the script on Python 3.7.4, but I believe it should work on earlier versions. Also the script is written for Linux/Ubuntu, for Windows you will need to adapt the parts of the script handling paths and creation of directories.

You will also need a good Markdown viewer/editor to see the saved files. I used Typora, but it looks like this will be a paid software when it exits beta version, so a good free alternative will be nice.

So, here's the Python script. Pay attention, settings are hard coded, you'll have to manually change them.

While I'm far from a Python or Steem dev expert, if you have questions let me know.

Feedback to improve from more experienced devs is welcomed as well. :)

import osimport sysimport jsonfrom steem import Steems = Steem()# script parameters# =================# authorauthor_name = 'testuser123'# relative directory under which the posts will be saved (don't add a final "/"!)main_save_dir = 'steem-posts-' + author_name# structure of directories under which posts will be saved# Options:# primary-tag - posts are saved under their primary tag subdirectory# year-month - posts are saved under the year-month of their creation date subdirectorydir_struct_option = 'year-month'print('Save posts by ' + dir_struct_option)# bool flag to determine if tags are added at the end of the post or notadding_tags_to_saved_post = Trueprint('Adding tags to the end of each post? ' + str(adding_tags_to_saved_post))# bool flag to determine if to save resteemed posts of other authors as wellinclude_resteem_posts = Falseprint('Include resteemed posts? ' + str(include_resteem_posts))# =====================# end script parameters##create main save directory (as a subdirectory or sub-path of the current directory)try: os.makedirs(main_save_dir) print('Directory ' + main_save_dir + ' created in current directory ' + os.curdir)except FileExistsError: print('Directory ' + main_save_dir + ' already exists in current directory ' + os.curdir)except OSError: print('Directory ' + main_save_dir + ' couldn\'t be created in current directory ' + os.curdir)#save current dircur_dir_saved = os.curdir# loops through all the posts of the given author# we break out of the loop after we reach the last post of the authori = 1while True: #retrieve current blog post info #theoretically we can retreieve more than one blog per call, in my tests anything more than 2 generated an error, so I prefered to take them one by one try: blogs = s.get_blog(author_name, i, 1) except Exception: print('Couldn\'t get blog #' + str(i) + '. Trying again. Ctrl+C to interrupt.') continue #is it empty? then we reached the end and we should break out of the loop if blogs == []: break #is it the author's post or a resteem? #if it's a resteem continue from the next iteration and resteems are not to be included if blogs[0]['comment']['author'] != author_name: if not include_resteem_posts: print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Skipping it.') i += 1 continue else: print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Including it.') #choose the name of the subdir where to place the saved posts #(i.e. posts can be saved by primary-tag or date [year-month]) if dir_struct_option == 'primary-tag': subdir_name = 'tags/' + blogs[0]['comment']['category'] elif dir_struct_option == 'year-month': subdir_name = 'date/' + blogs[0]['comment']['created'][0:7] #attempt to create the subdir first if cur_dir_saved == '.': dir_name = main_save_dir + '/' + subdir_name elif cur_dir_saved == '/': dir_name = cur_dir_saved + main_save_dir + '/' + subdir_name else: dir_name = cur_dir_saved + '/' + main_save_dir + '/' + subdir_name #create the subdirectory/ies where we will place our files try: os.makedirs(dir_name) print('Directory ' + dir_name + ' created.') except FileExistsError: pass except OSError: print('Directory ' + dir_name + ' couldn\'t be created.') raise OSError #deserialize json_metadata json_metadata_str = blogs[0]['comment']['json_metadata'] json_metadata_dict = json.loads(json_metadata_str) try: format = json_metadata_dict['format'] except KeyError: print('Broken blog json before format key. Defaulting to "markdown+html".') format = 'markdown+html' #is the post markdown? if format == 'markdown+html' or format == 'markdown': #choose the filename as the blog post's permlink + ".md" extension filename = blogs[0]['comment']['permlink'] + '.md' if (adding_tags_to_saved_post): #get tags and create a string with them to add at the end of the post try: tags_str = '\n\n' for x in json_metadata_dict['tags']: tags_str += '#' + x + ' ' except KeyError: tags_str = '' else: tags_str = '' #get post body body = blogs[0]['comment']['body'] #get post title title = blogs[0]['comment']['title'] #format the body to also include title at the begining as H1 and tags (with #) at the end body_with_title_and_tags = '# ' + title + '\n\n' + body + tags_str #or is the post raw html? else: #choose the filename as the blog post's permlink + ".md" extension filename = blogs[0]['comment']['permlink'] + '.html' if (adding_tags_to_saved_post): #get tags and create a string with them to add at the end of the post try: tags_str = '\n\n' for x in json_metadata_dict['tags']: tags_str += '<a id="' + x + '" href="#' + x + '">' + x + '</a> ' except KeyError: tags_str = '' else: tags_str = '' #get post body body = blogs[0]['comment']['body'] #get post title title = blogs[0]['comment']['title'] #format the body to also include title at the begining as H1 and tags (with #) at the end body_with_title_and_tags = '<h1>' + title + '</h1>\n\n' + body + tags_str #write post to file (overwrite if exists) try: f = open(dir_name + '/' + filename, 'w') f.write(body_with_title_and_tags) f.close() print('Post #' + str(i) + ': ' + dir_name + '/' + filename + ' successfully saved.') except OSError: print('Something went wrong while attempting to write file ' + dir_name + '/' + filename) raise OSError i+=1print('No (more) posts.')

Update: Edited the post because in the original there were some errors due to the copy-pasted code to html, which I haven't initially tested.

TAGS: [ #python ] [ #save-allposts ] [ #markdown ] [ #html ] [ #technology ] [ #neoxian ] [ #palnet ]

Replies

@olaf123 | March 9, 2020, 6:32 p.m. | Votes: 0 | [ VOTE ]

According to the Bible, Graven Images: Should You Worship These According to the Bible?

Watch the Video below to know the Answer...

(Sorry for sending this comment. We are not looking for our self profit, our intentions is to preach the words of God in any means possible.)
https://youtu.be/vJWTMjWmdMQ
Comment what you understand of our Youtube Video to receive our full votes. We have 30,000 #SteemPower. It's our little way to Thank you, our beloved friend.
Check our Discord Chat
Join our Official Community: https://steemit.com/created/hive-182074

@the-real-jesus | March 9, 2020, 8:46 p.m. | Votes: 1 | [ VOTE ]

My name is Jesus Christ and I do not condone this spamming in my name. Your spam is really fucking annoying @hiroyamagishi aka @overall-servant aka @olaf123 and your spam-bot army. This is not what my father, God, created the universe for. You must stop spamming immediately or I will make sure that you go to hell.

If anybody wants to support my eternal battling of these relentless religion spammers, please consider upvoting this comment or delegating to @the-real-jesus

@maxsieg | March 10, 2020, 2:47 p.m. | Votes: 2 | [ VOTE ]

why use steem module and not beem module? many steem features are no longer up to date.
https://github.com/holgern/beem
beem is a bit more uptodate. what i noticed though when the api.steemit.com site was down, is that it relies even if you specify a different node, still on the steemit node, so when installing it from github you first have to replace all api.steemit.com in the sourcecode with a different API you trust.

also ive been trying to write posts and upvote using directly the API requests over the requests module to be able to update my code more flexibly and not rely on another steem user but i havent figured out yet how to correctly format the broadcast operation and i havent found anyone yet willing to help me....

but here is what i have for example to get the blog posts from your steem account:

import json
import ast
import requests
def query(node,data,tor):
headers = {'Content-Type': 'application/json',}
if tor==False:
return requests.post(node,headers=headers, data=data)
else:
session=requests.session()
session.proxies={'http': 'socks5://127.0.0.1:9050', 'https': 'socks5://127.0.0.1:9050'}
return session.post(node,headers=headers, data=data, proxy=proxy)
def get_blog(name,nod,tor,start,end):
querry='{"jsonrpc":"2.0", "method":"condenser_api.get_blog", "params":["'+name+'",'+str(start)+','+str(end)+'], "id":1}'
return dict(dict(json.loads(query(nod,querry,tor).text))["result"][0])

i havent tested yet (and i see now tht i comment some mistakes) if the tor function works yet, but when having the tor browser open and sending the traffic over local host port 9050 would usually send the traffic through the tor browser.

if someone were so kind and help me out how to correctly write a vote query broadcast operation i would be very grateful

@gadrian | March 10, 2020, 3:23 p.m. | Votes: 0 | [ VOTE ]

> why use steem module and not beem module? many steem features are no longer up to date.

I haven't seen Holger in a while. Will he or someone else keep updating beem? Not that there's anyone updating Steem APIs at Steemit, Inc. now.

You're already more experienced in Python and Steem/beem APIs than I am. Maybe you'll receive some guidance from someone who is even more experienced...

@sathyasankar | March 10, 2020, 3:41 p.m. | Votes: 1 | [ VOTE ]

Great.. I will try this out.

@petertag | March 10, 2020, 5:38 p.m. | Votes: 2 | [ VOTE ]

As a note, I use VS Code (because I'm a dev I guess) w/ an extension to preview .md files as I write them (basically like writing a post with preview), probably similar free apps to do it with that aren't as massive as VS Code though.

@gadrian | March 10, 2020, 5:59 p.m. | Votes: 0 | [ VOTE ]

Yes, I used VS Code to write this Python script as well. Didn't try it for md though, but I will. Thanks for mentioning it.

@petertag | March 10, 2020, 6:05 p.m. | Votes: 0 | [ VOTE ]

Just checked it, I was using Markdown Preview Enhanced for the extension, looks like there are a few though. No problem, nice script man!

@gadrian | March 10, 2020, 6:07 p.m. | Votes: 0 | [ VOTE ]

Great, I'll check it out. Thanks again!

@steemitboard | March 16, 2020, 4:37 p.m. | Votes: 0 | [ VOTE ]

[IMAGE: https://steemitimages.com/175x175/http://steemitboard.com/@gadrian/level.png?202003161601]
@gadrian, sorry to see you have less Steem Power.
Your level lowered and you are now a Red Fish!

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

[ BACK TO TRENDING ] [ BACK TO MENU ]

Python Script to Save All Your Posts As MarkDown or Html Files

Replies

According to the Bible, Graven Images: Should You Worship These According to the Bible?

Watch the Video below to know the Answer...

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

KEYCHAIN VOTE