wikimini - Wikipedia-to-Gemtext converter

Age	Commit message (Collapse)	Author
2021-09-27	remove walrusmore-tests	Daniel Schadt
	This makes wikimini work with Python below 3.8.
2021-09-15	add more tests for extract_plaintext	Daniel Schadt

2021-09-05	add more tests for wikimini.document	Daniel Schadt

2021-08-26	start working on more tests	Daniel Schadt

2021-08-26	remove unnecessary >>> from documentation examplesHEAD master	Daniel Schadt

2021-08-26	Pass args to pytest	Daniel Schadt

2021-08-26	lint fixes	Daniel Schadt

2021-08-24	Fix BlockQuote.to_content/to_nodes	Daniel Schadt

2021-08-24	make LineBreak inherit from Block	Daniel Schadt
	This was an oversight when LineBreak was introduced
2021-08-24	make sure to include all subpackages in setup.py	Daniel Schadt

2021-08-24	add sphinx tox target	Daniel Schadt
	This allows us to build the documentation using tox -e sphinx
2021-08-24	fix empty return of tmpl_quote	Daniel Schadt

2021-08-23	Add basic infrastructure for tests & linters	Daniel Schadt
	We use tox as the runner, which allows us to easily do commands like tox tox -e pylint tox -e mypy to run the different tests/linters.
2021-08-22	Remove unneeded import	Daniel Schadt

2021-08-22	Add some initial documentation	Daniel Schadt

2021-08-21	More type fixes	Daniel Schadt
	This makes mypy happier about how we use .extend, since it doesn't know about the fact that all list items should be either Node or Block, not mixed. In the first case, we can fix it by using .append(current[0]), as mypy will see the isinstance() above and ascribe the correct type. In the second case, we use the cast no-op function to assign the correct type. That should be cheap and makes mypy happy, even if it doesn't give us any runtime checks (which is fine, this is Python after all). In the third case, we're iterating over every element anyway, so we can throw in an assert to make sure the list item has the correct type. This also helps in catching some bugs, in case .convert() returns a mixed list (but only if the first item happens to be a Node).
2021-08-21	Move table parsing out to separate function	Daniel Schadt
	Just to clean up the code a bit :-)
2021-08-21	Remove Document return type from convert	Daniel Schadt
	It doesn't really make sense to make the output that dependent on the input type. convert_to_document basically does the same - ensuring that the output is a Document. Templates that call convert recursively can also call convert_to_document. Furthermore, since we now pass very short Wikicode into convert_to_document, there was an exception being raised if there were less than 2 nodes. This seems to be because of a "smart list" that mwparserfromhell uses internally, which didn't properly slice if the end index is out of bounds.
2021-08-21	Type fixes in templates.Template	Daniel Schadt
	Since we switched to our internal document representation, the templates don't return the plain nodes anymore.
2021-08-21	Allow passing a custom template registry	Daniel Schadt

2021-08-21	generalize File: stripping	Daniel Schadt
	This has two reasons: First, there's more than just File: we might want to strip. Category: was another example, but there's more - User:, Help:, ... Using \w+ should catch them all. Secondly, and maybe more importantly, different languages have their namespaces localized as well. For example, in German, we have Datei: instead of File:, or Kategorie: instead of Category:. This fix makes the stripping work properly there as well. One future change that might have to be done is to expand the regex to catch namespaces with a space/underscore in it.
2021-08-20	Merge branch 'document-repr'	Daniel Schadt

2021-08-20	strip template name before looking it up	Daniel Schadt
	Some templates seem to be invoked with a trailing space at the end of the name, which we need to strip before searching our template registry.
2021-08-20	fix handling of link items with trailing plural s	Daniel Schadt

2021-08-20	properly strip File: links that got through	Daniel Schadt

2021-08-20	implement Gemtext format	Daniel Schadt

2021-08-20	Rework ItemList/BlockQuote to hold Paragraph	Daniel Schadt
	A List[Node] is basically a Paragraph, and we already delegated some of the methods to Paragraph (see ItemList.cleanup). Therefore, it only made sense to rework ItemList and BlockQuote to hold a Paragraph instead of a List[Node].
2021-08-20	add Format base class	Daniel Schadt

2021-08-20	Rename Blockquote to BlockQuote	Daniel Schadt
	This keeps it more in line with BlockLink. Also, this adds a LineBreak block, which templates can use to enfore extra line breaks.
2021-08-19	implement style fixes suggested by pycodestyle	Daniel Schadt

2021-08-19	Add an internal Document representation	Daniel Schadt
	Doing everything on strings is kinda wonky, so this adds an intermediate representation. The idea behind this is that the pipeline now goes Wikicode [1]-> Document [2]-> Output String Where step 1 takes care of templates and everything, and step 2 does the actual output formatting. This has the benefit that we can support multiple output types, some with more and some with less features (e.g., adding a Markdown output which keeps some of the original formatting intact), and it has the benefit of being less wonky (no hacks with "<!NUM!>" for numbered lists, more streamlined formatting with newlines, ...).
2021-08-17	Fix ref being shown in quotes	Daniel Schadt

2021-08-17	add {{lang-XX}}, {{IPA-XX}} and {{XXX}} templates	Daniel Schadt

2021-08-17	Reorganize code	Daniel Schadt
	Cramming everything into a single file is not necessarily good, so this patch splits it up a bit. Furthermore, the templates are no longer hardcoded, but managed through a registry. This breaks the lang-ar implementation, which was a weird special case anyway. Properly fixing it would be to include all country codes.
2021-08-16	fix default header value for tabulate	Daniel Schadt

2021-08-16	Add rendering for {{quote\|...}} templates	Daniel Schadt

2021-08-16	add {{main article\|...}} as alias for {{main\|...}}	Daniel Schadt

2021-08-16	Retrieve title as well	Daniel Schadt

2021-08-16	Add a redirection page	Daniel Schadt

2021-08-16	Handle lists without extra space	Daniel Schadt
	Sometimes, lists are written like text text Which can throw both the pattern-conversion and Gemini browsers off. It's better to insert the extra space here and be safe.
2021-08-16	Render {{script\|...}} like we do with {{lang\|...}}	Daniel Schadt
	Even though those macros are slightly different, we can handle them in pretty much the same way (discard the meta information and just output the characters).
2021-08-16	Render {{main\|...}} template	Daniel Schadt

2021-08-16	Add a setup.py	Daniel Schadt

2021-08-16	Initial commit	Daniel Schadt
	This is not even a proper Python package yet, but the output is surprisingly good already, so I'd like to take this version and save it.