Age | Commit message (Collapse) | Author |
|
Just to clean up the code a bit :-)
|
|
It doesn't really make sense to make the output that dependent on the
input type. convert_to_document basically does the same - ensuring that
the output is a Document. Templates that call convert recursively can
also call convert_to_document.
Furthermore, since we now pass very short Wikicode into
convert_to_document, there was an exception being raised if there were
less than 2 nodes. This seems to be because of a "smart list" that
mwparserfromhell uses internally, which didn't properly slice if the end
index is out of bounds.
|
|
Since we switched to our internal document representation, the templates
don't return the plain nodes anymore.
|
|
|
|
This has two reasons:
First, there's more than just File: we might want to strip. Category:
was another example, but there's more - User:, Help:, ... Using \w+
should catch them all.
Secondly, and maybe more importantly, different languages have their
namespaces localized as well. For example, in German, we have Datei:
instead of File:, or Kategorie: instead of Category:. This fix makes the
stripping work properly there as well.
One future change that might have to be done is to expand the regex to
catch namespaces with a space/underscore in it.
|
|
|
|
Some templates seem to be invoked with a trailing space at the end of
the name, which we need to strip before searching our template registry.
|
|
|
|
|
|
|
|
A List[Node] is basically a Paragraph, and we already delegated some of
the methods to Paragraph (see ItemList.cleanup). Therefore, it only made
sense to rework ItemList and BlockQuote to hold a Paragraph instead of a
List[Node].
|
|
|
|
This keeps it more in line with BlockLink. Also, this adds a LineBreak
block, which templates can use to enfore extra line breaks.
|
|
|
|
Doing everything on strings is kinda wonky, so this adds an intermediate
representation. The idea behind this is that the pipeline now goes
Wikicode [1]-> Document [2]-> Output String
Where step 1 takes care of templates and everything, and step 2 does the
actual output formatting. This has the benefit that we can support
multiple output types, some with more and some with less features (e.g.,
adding a Markdown output which keeps some of the original formatting
intact), and it has the benefit of being less wonky (no hacks with
"<!NUM!>" for numbered lists, more streamlined formatting with newlines,
...).
|
|
|
|
|
|
Cramming everything into a single file is not necessarily good, so this
patch splits it up a bit. Furthermore, the templates are no longer
hardcoded, but managed through a registry.
This breaks the lang-ar implementation, which was a weird special case
anyway. Properly fixing it would be to include all country codes.
|
|
|
|
|
|
|
|
|
|
|
|
Sometimes, lists are written like
*text
*text
Which can throw both the pattern-conversion and Gemini browsers off.
It's better to insert the extra space here and be safe.
|
|
Even though those macros are slightly different, we can handle them in
pretty much the same way (discard the meta information and just output
the characters).
|
|
|
|
|
|
This is not even a proper Python package yet, but the output is
surprisingly good already, so I'd like to take this version and save it.
|