summaryrefslogtreecommitdiff
path: root/docs/designdefense.rst
diff options
context:
space:
mode:
authorChris McDonough <chrism@agendaless.com>2010-08-06 20:04:53 +0000
committerChris McDonough <chrism@agendaless.com>2010-08-06 20:04:53 +0000
commit7b511ad431c5ada6ee1a54b5e40fbebdbeb4ee8d (patch)
treec04123cc342c2dae3545d811974ca4be8798e3a2 /docs/designdefense.rst
parentb05de75dd707a77c4ca3da5780de79efe4ad1092 (diff)
downloadpyramid-7b511ad431c5ada6ee1a54b5e40fbebdbeb4ee8d.tar.gz
pyramid-7b511ad431c5ada6ee1a54b5e40fbebdbeb4ee8d.tar.bz2
pyramid-7b511ad431c5ada6ee1a54b5e40fbebdbeb4ee8d.zip
Microframework defense.
Diffstat (limited to 'docs/designdefense.rst')
-rw-r--r--docs/designdefense.rst537
1 files changed, 537 insertions, 0 deletions
diff --git a/docs/designdefense.rst b/docs/designdefense.rst
index a5d515d4e..c0dc9b170 100644
--- a/docs/designdefense.rst
+++ b/docs/designdefense.rst
@@ -1199,6 +1199,543 @@ returns Zope3-security-proxy-wrapped objects for each traversed object
the effect of creating a more Zope3-like environment without much
effort.
+.. _microframeworks_smaller_hello_world:
+
+Microframeworks Have Smaller Hello World Programs
+-------------------------------------------------
+
+Some developers and microframework authors point out that BFG's "hello
+world" program is longer (by about five lines) than the equivalent
+program in their favorite microframework. Guilty as charged; in a
+contest of "whose is shortest", BFG indeed loses.
+
+This loss isn't for lack of trying: BFG aims to be useful in the same
+circumstance in which microframeworks claim dominance: single-file
+applications. But BFG doesn't sacrifice its ability to credibly
+support larger applications in order to achieve hello-world LoC parity
+with the current crop of microframeworks. BFG's design instead tries
+to avoid some common pitfalls associated with naive declarative
+configuration schemes.
+
+.. _you_dont_own_modulescope:
+
+Application Programmers Don't Control The Module-Scope Codepath (Import-Time Side-Effects Are Evil)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Please imagine a directory structure with a set of Python files in
+it:
+
+.. code-block:: text
+
+ .
+ |-- app.py
+ |-- app2.py
+ `-- config.py
+
+The contents of ``app.py``:
+
+.. code-block:: python
+
+ from config import decorator
+ from config import L
+ import pprint
+
+ @decorator
+ def foo():
+ pass
+
+ if __name__ == '__main__':
+ import app2
+ pprint.pprint(L)
+
+The contents of ``app2.py``:
+
+.. code-block:: python
+
+ import app
+
+ @app.decorator
+ def bar():
+ pass
+
+The contents of ``config.py``:
+
+.. code-block:: python
+
+ L = []
+
+ def decorator(func):
+ L.append(func)
+ return func
+
+If we cd to the directory that holds these files and we run ``python
+app.py`` given the directory structure and code above, what happens?
+Presuably, our ``decorator`` decorator will be used twice, once by the
+decorated function ``foo`` in ``app.py`` and once by the decorated
+function ``bar`` in ``app2.py``. Since each time the decorator is
+used, the list ``L`` in ``config.py`` is appended to, we'd expect a
+list with two elements to be printed, right? Sadly, no:
+
+.. code-block:: bash
+
+ [chrism@thinko]$ python app.py
+ [<function foo at 0x7f4ea41ab1b8>,
+ <function foo at 0x7f4ea41ab230>,
+ <function bar at 0x7f4ea41ab2a8>]
+
+By visual inspection, that outcome (three different functions in the
+list) seems impossible. We only defined two functions and we
+decorated each of those functions only once, so we believe that the
+``decorator`` decorator will only run twice. However, what we believe
+is wrong because the code at module scope in our ``app.py`` module was
+*executed twice*. The code is executed once when the script is run as
+``__main__`` (via ``python app.py``), and then it is executed again
+when ``app2.py`` imports the same file as ``app``.
+
+What does this have to do with our comparison to microframeworks?
+Many microframeworks in the current crop (e.g. Bottle, Flask)
+encourage you to attach configuration decorators to objects defined at
+module scope. These decorators execute arbitrarily complex
+registration code which populates a singleton registry that is a
+global defined in external Python module. This is analogous to the
+above example: the "global registry" in the above example is the list
+``L``.
+
+Let's see what happens when we use the same pattern with the (
+`Groundhog <http://bfg.repoze.org/videos#groundhog1>`_ microframework.
+Replace the contents of ``app.py`` above with this:
+
+.. code-block:: python
+
+ from config import gh
+
+ @gh.route('/foo/')
+ def foo():
+ return 'foo'
+
+ if __name__ == '__main__':
+ import app2
+ pprint.pprint(L)
+
+Replace the contents of ``app2.py`` above with this:
+
+.. code-block:: python
+
+ import app
+
+ @app.gh.route('/bar/')
+ def bar():
+ 'return bar'
+
+Replace the contents of ``config.py`` above with this:
+
+.. code-block:: python
+
+ from groundhog import Groundhog
+ gh = Groundhog('myapp', 'seekrit')
+
+How many routes will be registered within the routing table of the
+"gh" Groundhog application? If you answered three, you are correct.
+How many would a casual reader (and any sane developer) expect to be
+registered? If you answered two, you are correct. Will the double
+registration be a problem? With our fictional Groundhog framework's
+``route`` method backing this application, not really. It will slow
+the application down a little bit, because it will need to miss twice
+for a route when it does not match. Will it be a problem with another
+framework, another application, or another decorator? Who knows. You
+need to understand the application in its totality, the framework in
+its totality, and the chronology of execution to be able to predict
+what the impact of unintentional code double-execution will be.
+
+The encouragement to use decorators which perform population of an
+external registry has an unintended consequence: the application
+developer now must assert ownership of every codepath that executes
+Python module scope code. This code is presumed by the current crop of
+decorator-based microframeworks to execute once and only once; if it
+executes more than once, weird things will start to happen. It is up
+to the application developer to maintain this invariant.
+Unfortunately, however, in reality, this is an impossible task,
+because, Python programmers *do not own the module scope codepath, and
+never will*. Microframework programmers therefore will at some point
+then need to start reading the tea leaves about what *might* happen if
+module scope code gets executed more than once like we do in the
+previous paragraph. This is a really pretty poor situation to find
+yourself in as an application developer: you probably didn't even know
+you signed up for the job, because the documentation offered by
+decorator-based microframeworks don't warn you about it.
+
+Python application programmers do not control the module scope
+codepath. Anyone who tries to sell you on the idea that they do is
+simply mistaken. Test runners that you may want to use to run your
+code's tests often perform imports of arbitrary code in strange orders
+that manifest bugs like the one demonstrated above. API documentation
+generation tools do the same. Some (mutant) people even think it's
+safe to use the Python ``reload`` command or delete objects from
+``sys.modules``, each of which has hilarious effects when used against
+code that has import- time side effects. When Python programmers
+assume they can use the module-scope codepath to run arbitrary code
+(especially code which populates an external registry), and this
+assumption is challenged by reality, the application developer is
+often required to undergo a painful, meticulous debugging process to
+find the root cause of an inevitably obscure symptom. The solution is
+often to rearrange application import ordering or move an import
+statement from module-scope into a function body. The rationale for
+doing so can never be expressed adequnately in the checkin message
+which accompanies the fix or documented succinctly enough for the
+benefit of the rest of the development team so that the problem never
+happens again. It will happen again next month too, especially if you
+are working on a project with other people who haven't yet
+internalized the lessons you learned while you stepped through
+module-scope code using ``pdb``.
+
+Folks who have a large investment in eager decorator-based
+configuration that populates an external data structure (such as
+microframework authors) may argue that the set of circumstances I
+outlined above is anomalous and contrived. They will argue that it
+just will never happen. If you never intend your application to grow
+beyond one or two or three modules, that's probably true. However, as
+your codebase grows, and becomes spread across a greater number of
+modules, the circumstances in which module-scope code will be executed
+multiple times will become more and more likely to occur and less and
+less predictable. It's not responsible to claim that double-execution
+of module-scope code will never happen. It will; it's just a matter
+of luck, time, and application complexity.
+
+If microframework authors do admit that the circumstance isn't
+contrived, they might then argue that "real" damage will never happen
+as the result of the double-execution (or triple-execution, etc) of
+module scope code. You would be wise to disbelieve this assertion.
+The potential outcomes of multiple execution are too numerous to
+predict because they involve delicate relationships between
+application and framework code as well as chronology of code
+execution. It's literally impossible for a framework author to know
+what will happen in all circumstances ("X is executed, then Y, then X
+again.. a train leaves Chicago at 50 mph... "). And even if given the
+gift of omniscience for some limited set of circumstances, the
+framework author almost certainly does not have the double-execution
+anomaly in mind when coding new features. He's thinking of adding a
+feature, not protecting against problems that might be caused by the
+1% multiple execution case. However, any 1% case may cause 50% of
+your pain on a project, so it'd be nice if it never occured.
+
+Responsible microframeworks actually offer a back-door way around the
+problem. They allow you to disuse decorator based configuration
+entirely. Instead of requiring you to do the following:
+
+.. code-block:: python
+
+ gh = Groundhog('myapp', 'seekrit')
+
+ @gh.route('/foo/')
+ def foo():
+ return 'foo'
+
+ if __name__ == '__main__':
+ gh.run()
+
+They allow you to disuse the decorator syntax and go
+almost-all-imperative:
+
+.. code-block:: python
+
+ def foo():
+ return 'foo'
+
+ gh = Groundhog('myapp', 'seekrit')
+
+ if __name__ == '__main__':
+ gh.add_route(foo, '/foo/')
+ gh.run()
+
+This is a generic mode of operation that is encouraged in the BFG
+documentation. Some existing microframeworks (Flask, in particular)
+allow for it as well. None (other than BFG) *encourage* it. If you
+never expect your application to grow beyond two or three or four or
+ten modules, it probably doesn't matter very much which mode you use.
+If your application grows large, however, imperative configuration can
+provide better predictability.
+
+.. note::
+
+ Astute readers may notice that BFG has configuration decorators too.
+ Aha! Don't these decorators have the same problems? No. These
+ decorators do not populate an external Python module when they are
+ executed. They only mutate the functions (and classes and methods)
+ they're attached to. These mutations must later be found during a
+ "scan" process that has a predictable and structured import phase.
+ Module-localized mutation is actually the best-case circumstance for
+ double-imports; if a module only mutates itself and its contents at
+ import time, if it is imported twice, that's OK, because each
+ decorator invocation will always be mutating an independent copy of
+ the object its attached to, not a shared resource like a registry in
+ another module. This has the effect that double-registrations will
+ never be performed.
+
+Routes (Usually) Need Relative Ordering
++++++++++++++++++++++++++++++++++++++++
+
+Consider the following simple `Groundhog
+<http://bfg.repoze.org/videos#groundhog1>`_ application:
+
+.. code-block:: python
+
+ from groundhog import Groundhog
+ app = Groundhog('myapp', 'seekrit')
+
+ app.route('/admin')
+ def admin():
+ return '<html>admin page</html>'
+
+ app.route('/:action')
+ def action():
+ if action == 'add':
+ return '<html>add</html>
+ if action == 'delete':
+ return '<html>delete</html>
+ return app.abort(404)
+
+ if __name__ == '__main__':
+ app.run()
+
+If you run this application and visit the URL ``/admin``, you will see
+"admin" page. This is the intended result. However, what if you
+rearrange the order of the function definitions in the file?
+
+.. code-block:: python
+
+ from groundhog import Groundhog
+ app = Groundhog('myapp', 'seekrit')
+
+ app.route('/:action')
+ def action():
+ if action == 'add':
+ return '<html>add</html>
+ if action == 'delete':
+ return '<html>delete</html>
+ return app.abort(404)
+
+ app.route('/admin')
+ def admin():
+ return '<html>admin page</html>'
+
+ if __name__ == '__main__':
+ app.run()
+
+If you run this application and visit the URL ``/admin``, you will now
+be returned a 404 error. This is probably not what you intended. The
+reason you see a 404 error when you rearrange function definition
+ordering is that routing declarations expressed via our
+microframework's routing decorators have an *ordering*, and that
+ordering matters.
+
+In the first case, where we achieved the expected result, we first
+added a route with the pattern ``/admin``, then we added a route with
+the pattern ``/:action`` by virtue of adding routing patterns via
+decorators at module scope. When a request with a ``PATH_INFO`` of
+``/admin`` enters our application, the web framework loops over each
+of our application's route patterns in the order in which they were
+defined in our module. As a result, the view associated with the
+``/admin`` routing pattern will be invoked: it matches first. All is
+right with the world.
+
+In the first case, where we did not achieve the expected result, we
+first added a route with the pattern ``/:action``, then we added a
+route with the pattern ``/admin``. When a request with a
+``PATH_INFO`` of ``/admin`` enters our application, the web framework
+loops over each of our application's route patterns in the order in
+which they were defined in our module. As a result, the view
+associated with the ``/:action`` routing pattern will be invoked: it
+matches first. A 404 error is raised.
+
+You may be willing to maintain an ordering of your view functions
+which reifies your routing policy. Your application may be small
+enough where this will never cause an issue. If it becomes large
+enough to matter, however, I don't envy you. Maintaining that
+ordering as your application grows larger will be difficult. At some
+point, you will also need to start controlling *import* ordering as
+well as function definition ordering. When your application grows
+beyond the size of a single file, and when decorators are used to
+register views, the non-``__main__`` modules which contain
+configuration decorators must be imported somehow for their
+configuration to be executed.
+
+Does that make you a little
+uncomfortable? It should, because :ref:`you_dont_own_modulescope`.
+
+"Stacked Object Proxies" Are Too Clever / Thread Locals Are A Nuisance
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+In another manifestation of "import fascination", some microframeworks
+use the ``import`` statement to get a handle to an object which *is
+not logically global*:
+
+.. code-block:: python
+
+ from flask import request
+
+ @app.route('/login', methods=['POST', 'GET'])
+ def login():
+ error = None
+ if request.method == 'POST':
+ if valid_login(request.form['username'],
+ request.form['password']):
+ return log_the_user_in(request.form['username'])
+ else:
+ error = 'Invalid username/password'
+ # this is executed if the request method was GET or the
+ # credentials were invalid
+
+The `Pylons <http://pylonshq.com>`_ web framework uses a similar
+strategy. It calls these things "Stacked Object Proxies", so, for
+purposes of this discussion, I'll do so as well.
+
+Import statements in Python (``import foo``, ``from bar import baz``)
+are most frequently performed to obtain a reference to an object
+defined globally within an external Python module. However, in
+"normal" programs, they are never used to obtain a reference to an
+object that has a lifetime measured by the scope of the body of a
+function. It would be absurd to try to import, for example, a
+variable named ``i`` representing a loop counter defined in the body
+of a function. For example, we'd never try to import ``i`` from the
+code below:
+
+.. code-block:: python
+
+ def afunc():
+ for i in range(10):
+ print i
+
+By its nature, the *request* object created as the result of a WSGI
+server's call into a long-lived web framework cannot be global,
+because the lifetime of a single request will be much shorter than the
+lifetime of the process running the framework. A request object
+created by a web framework actually has more similarity to the ``i``
+loop counter in our example above than it has to any comparable
+importable object defined in the Python standard library or in
+"normal" library code.
+
+However, systems which use stacked object proxies promote locally
+scoped objects such as ``request`` out to module scope, for the
+purpose of being able to offer users a "nice" spelling involving
+``import``. They, for what I consider dubious reasons, would rather
+present to their users the canonical way of getting at a ``request``
+as ``from framework import request`` instead of a saner ``from
+myframework.threadlocals import get_request; request = get_request()``
+even though the latter is more explicit.
+
+It would be *most* explicit if the microframeworks did not use thread
+local variables at all. BFG view functions are passed a request
+object; many of BFG's APIs require that an explicit request object be
+passed to them. It is *possible* to retrieve the current BFG request
+as a threadlocal variable but it is a "in case of emergency, break
+glass" type of activity. This explicitness makes BFG view functions
+more easily unit testable, as you don't need to rely on the framework
+to manufacture suitable "dummy" request (and other similarly-scoped)
+objects during test setup. It also makes them more likely to work on
+arbitrary systems, such as async servers that do no monkeypatching.
+
+Explicitly WSGI
++++++++++++++++
+
+Some microframeworks offer a ``run()`` method of an application object
+that executes a default server configuration for easy execution.
+
+BFG doesn't currently try to hide the fact that its router is a WSGI
+application behind a convenience ``run()`` API. It just tells people
+to import a WSGI server and use it to serve up their BFG application
+as per the documentation of that WSGI server.
+
+The extra lines saved by abstracting away the serving step behind
+``run()`` seem to have driven dubious second-order decisions related
+to API in some microframeworks. For example, Bottle contains a
+``ServerAdapter`` subclass for each type of WSGI server it supports
+via its ``app.run()`` mechanism. This means that there exists code in
+``bottle.py`` that depends on the following modules: ``wsgiref``,
+``flup``, ``paste``, ``cherrypy``, ``fapws``, ``tornado``,
+``google.appengine``, ``twisted.web``, ``diesel``, ``gevent``,
+``gunicorn``, ``eventlet``, and ``rocket``. You choose the kind of
+server you want to run by passing its name into the ``run`` method.
+In theory, this sounds great: I can try Bottle out on ``gunicorn``
+just by passing in a name! However, to fully test Bottle, all of
+these third-party systems must be installed and functional; the Bottle
+developers must monitor changes to each of these packages and make
+sure their code still interfaces properly with them. This expands the
+packages required for testing greatly; this is a *lot* of
+requirements. It is likely difficult to fully automate these tests
+due to requirements conflicts and build issues.
+
+As a result, for single-file apps, we currently don't bother to offer
+a ``run()`` shortcut; we tell folks to import their WSGI server of
+choice and run it "by hand". For the people who want a server
+abstraction layer, we suggest that they use PasteDeploy. In
+PasteDeploy-based systems, the onus for making sure that the server
+can interface with a WSGI application is placed on the server
+developer, not the web framework developer, making it more likely to
+be timely and correct.
+
+All of the above said, BFG version 1.3 will offer a ``run()`` - like
+shortcut serving API which executes the ``paste.httpserver`` WSGI
+server. It will likely be named less attractively to indicate it is
+only a shortcut.
+
+:meth:`repoze.bfg.configuration.Configurator.begin` and
+:meth:`repoze.bfg.configuration.Configurator.end` methods
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+The methods :meth:`repoze.bfg.configuration.Configurator.begin` and
+:meth:`repoze.bfg.configuration.Configurator.end` are used to bracket
+the configuration phase of a :mod:`repoze.bfg` application.
+
+These exist because existing legacy third party *configuration* (not
+runtime) code relies on a threadlocal stack being populated. The
+``begin`` method pushes data on to a threadlocal stack. The ``end``
+method pops it back off.
+
+For the simplest applications, these lines are actually not required.
+I *could* omit them from every BFG hello world app without ill effect.
+However, when users use certain configuration methods (ones not
+represented in the hello world app), calling code will begin to fail
+when it is not bracketed between a ``begin()`` and an ``end()``. It
+is just easier to tell users that this bracketing is required than to
+try to explain to them which circumstances it is actually required and
+which it is not, because the explanation is often torturous.
+
+The effectively-required execution of these two methods is a wholly
+bogus artifact of an early bad design decision which encouraged
+application developers to use threadlocal data structures during the
+execution of configuration plugins. However, I don't hate my
+framework's users enough to break backwards compatibility for the sake
+of removing two boilerplate lines of code, so it stays, at least for
+the foreseeable future. If I eventually figure out a way to remove
+the requirement, these methods will turn into no-ops and they will be
+removed from the documenation.
+
+Wrapping Up
++++++++++++
+
+Here's a diagrammed version of the simplest repoze.bfg application,
+where comments take into account what we've discussed in the
+:ref:`microframeworks_smaller_hello_world` section.
+
+.. code-block:: python
+
+ from webob import Response # explicit response objects, no TL
+ from paste.httpserver import serve # explicitly WSGI
+
+ def hello_world(request): # accepts a request; no request thread local reqd
+ # explicit response object means no response threadlocal
+ return Response('Hello world!')
+
+ if __name__ == '__main__':
+ from repoze.bfg.configuration import Configurator
+ config = Configurator() # no global application object.
+ config.begin() # bogus, but required.
+ config.add_view(hello_world) # explicit non-decorator registration
+ config.end() # bogus, but required.
+ app = config.make_wsgi_app() # explicitly WSGI
+ serve(app, host='0.0.0.0') # explicitly WSGI
+
+
Other Challenges
----------------