Recently I have been converting a large number of old blog posts to markdown, and have been writing a bunch of little one off scripts to handle this. One of the things I needed to do was convert a bunch of oddly formatted HTML links to their shiny clean markdown equivalent.
#!/usr/bin/env ruby # Convert HTML links to Markdown links ARGF.each do |line| puts line.gsub /<a [^h]*href=["']([^"']*)["'][^>]*>([^<]*)<\/a>/, "[\\2](\\1)" end
My script follows the standard unix convention, and reads from standard in or a file, and outputs to standard out. To use it you could do something like:
ruby html_links_to_markdown.rb <file.html> > file.markdown
I should note, there are a few minor caveats. It will not:
<a href="#"><imr src="fail.png" /></a>
<a height="4" href="#">fail</a>
Those cases should just result in the original ugly links remaining in the document, which can be manually fixed. I thought it was helpful, and handled many of the badly formatted >a< tags that haunted my old posts.