Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
This was an oversight when LineBreak was introduced
|
|
|
|
This allows us to build the documentation using
tox -e sphinx
|
|
|
|
We use tox as the runner, which allows us to easily do commands like
tox
tox -e pylint
tox -e mypy
to run the different tests/linters.
|
|
|
|
|
|
This makes mypy happier about how we use .extend, since it doesn't know
about the fact that all list items should be either Node or Block, not
mixed.
In the first case, we can fix it by using .append(current[0]), as mypy
will see the isinstance() above and ascribe the correct type.
In the second case, we use the cast no-op function to assign the correct
type. That should be cheap and makes mypy happy, even if it doesn't give
us any runtime checks (which is fine, this is Python after all).
In the third case, we're iterating over every element anyway, so we can
throw in an assert to make sure the list item has the correct type. This
also helps in catching some bugs, in case .convert() returns a mixed
list (but only if the first item happens to be a Node).
|
|
Just to clean up the code a bit :-)
|
|
It doesn't really make sense to make the output that dependent on the
input type. convert_to_document basically does the same - ensuring that
the output is a Document. Templates that call convert recursively can
also call convert_to_document.
Furthermore, since we now pass very short Wikicode into
convert_to_document, there was an exception being raised if there were
less than 2 nodes. This seems to be because of a "smart list" that
mwparserfromhell uses internally, which didn't properly slice if the end
index is out of bounds.
|
|
Since we switched to our internal document representation, the templates
don't return the plain nodes anymore.
|
|
|
|
This has two reasons:
First, there's more than just File: we might want to strip. Category:
was another example, but there's more - User:, Help:, ... Using \w+
should catch them all.
Secondly, and maybe more importantly, different languages have their
namespaces localized as well. For example, in German, we have Datei:
instead of File:, or Kategorie: instead of Category:. This fix makes the
stripping work properly there as well.
One future change that might have to be done is to expand the regex to
catch namespaces with a space/underscore in it.
|
|
|
|
Some templates seem to be invoked with a trailing space at the end of
the name, which we need to strip before searching our template registry.
|
|
|
|
|
|
|
|
A List[Node] is basically a Paragraph, and we already delegated some of
the methods to Paragraph (see ItemList.cleanup). Therefore, it only made
sense to rework ItemList and BlockQuote to hold a Paragraph instead of a
List[Node].
|
|
|
|
This keeps it more in line with BlockLink. Also, this adds a LineBreak
block, which templates can use to enfore extra line breaks.
|
|
|
|
Doing everything on strings is kinda wonky, so this adds an intermediate
representation. The idea behind this is that the pipeline now goes
Wikicode [1]-> Document [2]-> Output String
Where step 1 takes care of templates and everything, and step 2 does the
actual output formatting. This has the benefit that we can support
multiple output types, some with more and some with less features (e.g.,
adding a Markdown output which keeps some of the original formatting
intact), and it has the benefit of being less wonky (no hacks with
"<!NUM!>" for numbered lists, more streamlined formatting with newlines,
...).
|
|
|
|
|
|
Cramming everything into a single file is not necessarily good, so this
patch splits it up a bit. Furthermore, the templates are no longer
hardcoded, but managed through a registry.
This breaks the lang-ar implementation, which was a weird special case
anyway. Properly fixing it would be to include all country codes.
|
|
|
|
|
|
|
|
|
|
|
|
Sometimes, lists are written like
*text
*text
Which can throw both the pattern-conversion and Gemini browsers off.
It's better to insert the extra space here and be safe.
|
|
Even though those macros are slightly different, we can handle them in
pretty much the same way (discard the meta information and just output
the characters).
|
|
|
|
|
|
This is not even a proper Python package yet, but the output is
surprisingly good already, so I'd like to take this version and save it.
|