Below you will find pages that utilize the taxonomy term “Python”
Converting BigBlueButton recordings to self-contained videos
When the pandemic lock downs started, my local Linux User Group started looking at video conferencing tools we could use to continue presenting talks and other events to members. We ended up adopting BigBlueButton: as well as being Open Source, it's focus on education made it well suited for presenting talks. It has the concept of a presenter role, and built in support for slides (it sends them to viewers as images, rather than another video stream). It can also record sessions for later viewing.
Using GAsyncResult APIs with Python's asyncio
With a GLib implementation of the Python asyncio event
loop, I can easily mix
asyncio code with GLib/GTK code in the same thread. The next step is
to see whether we can use this to make any APIs more convenient to
use. A good candidate is APIs that make use of GAsyncResult
.
These APIs generally consist of one function call that initiates the
asynchronous job and takes a callback. The callback will be invoked
sometime later with a GAsyncResult
object, which can be passed to a
"finish" function to convert this to the result type relevant to the
original call. This sort of API is a good candidate to convert to an
asyncio coroutine.
Exploring Github Actions
To help keep myself honest, I wanted to set up automated test runs on a few personal projects I host on Github. At first I gave Travis a try, since a number of projects I contribute to use it, but it felt a bit clunky. When I found Github had a new CI system in beta, I signed up for the beta and was accepted a few weeks later.
While it is still in development, the configuration language feels lean and powerful. In comparison, Travis's configuration language has obviously evolved over time with some features not interacting properly (e.g. matrix expansion only working on the first job in a workflow using build stages). While I've never felt like I had a complete grasp of the Travis configuration language, the single page description of Actions configuration language feels complete.
GLib integration for the Python asyncio event loop
As an evening project, I've been working on a small library that integrates the GLib main loop with Python's asyncio. I think I've gotten to the point where it might be useful to other people, so have pushed it up here:
https://github.com/jhenstridge/asyncio-glib
This isn't the only attempt to integrate the two event loops, but the other I found (Gbulb) is unmaintained and seems to reimplement a fair bit of the asyncio (e.g. it has its own transport classes). So I thought I'd see if I could write something smaller and more maintainable, reusing as much code from the standard library as possible.
Extracting BIOS images and tools from ThinkPad update ISOs
With my old ThinkPad, Lenovo provided BIOS updates in the form of Windows executables or ISO images for a bootable CD. Since I had wiped Windows partition, the first option wasn't an option. The second option didn't work either, since it expected me to be using the drive in the base I hadn't bought. Luckily I was able to just copy the needed files out of the ISO image to a USB stick that had been set up to boot DOS.
u1ftp: a demonstration of the Ubuntu One API
One of the projects I've been working on has been to improve aspects of the Ubuntu One Developer Documentation web site. While there are still some layout problems we are working on, it is now in a state where it is a lot easier for us to update.
I have been working on updating our authentication/authorisation documentation and revising some of the file storage documentation (the API used by the mobile Ubuntu One clients). To help verify that the documentation was useful, I wrote a small program to exercise those APIs. The result is u1ftp: a program that exposes a user's files via an FTP daemon running on localhost. In conjunction with the OS file manager or a dedicated FTP client, this can be used to conveniently access your files on a system without the full Ubuntu One client installed.
Packaging Python programs as runnable ZIP files
One feature in recent versions of Python I hadn't played around with until recently is the ability to package up a multi-module program into a ZIP file that can be run directly by the Python interpreter. I didn't find much information about it, so I thought I'd describe what's necessary here.
Python has had the ability to add ZIP files to the module search path since PEP 273 was implemented in Python 2.3. That can let you package up most of your program into a single file, but doesn't help with the main entry point.
pygpgme 0.3
This week I put out a new release of pygpgme: a Python extension that lets you perform various tasks with OpenPGP keys via the GPGME library. The new release is available from both Launchpad and PyPI.
There aren't any major new extensions to the API, but this is the first release to support Python 3 (Python 2.x is still supported though). The main hurdle was ensuring that the module correctly handled text vs. binary data. The split I ended up on was to treat most things as text (including textual representations of binary data such as key IDs and fingerprints), and treat the data being passed into or returned from the encryption, decryption, signing and verification commands as binary data. I haven't done a huge amount with the Python 3 version of the module yet, so I'd appreciate bug reports if you find issues.
Watching iView with Rygel
One of the features of Rygel that I found most interesting was the external media server support. It looked like an easy way to publish information on the network without implementing a full UPnP/DLNA media server (i.e. handling the UPnP multicast traffic, transcoding to a format that the remote system can handle, etc).
As a small test, I put together a server that exposes the ABC's iView service to UPnP media renderers. The result is a bit rough around the edges, but the basic functionality works. The source can be grabbed using Bazaar:
django-openid-auth
Last week, we released the source code to django-openid-auth. This is a small library that can add OpenID based authentication to Django applications. It has been used for a number of internal Canonical projects, including the sprint scheduler Scott wrote for the last Ubuntu Developer Summit, so it is possible you've already used the code.
Rather than trying to cover all possible use cases of OpenID, it focuses on providing OpenID Relying Party support to applications using Django's django.contrib.auth authentication system. As such, it is usually enough to edit just two files in an existing application to enable OpenID login.
Getting "bzr send" to work with GMail
One of the nice features of Bazaar is the ability to send a bundle of changes to someone via email. If you use a supported mail client, it will even open the composer with the changes attached. If your client isn't supported, then it'll let you compose a message in your editor and then send it to an SMTP server.
GMail is not a supported mail client, but there are a few work arounds listed on the wiki. Those really come down to using an alternative mail client (either the editor or Mutt) and sending the mails through the GMail SMTP server. Neither solution really appealed to me. There doesn't seem to be a programatic way of opening up GMail's compose window and adding an attachment (not too surprising for a web app).
Using Twisted Deferred objects with gio
The gio library provides both synchronous and asynchronous interfaces for performing IO. Unfortunately, the two APIs require quite different programming styles, making it difficult to convert code written to the simpler synchronous API to the asynchronous one.
For C programs this is unavoidable, but for Python we should be able to do better. And if you're doing asynchronous event driven code in Python, it makes sense to look at Twisted. In particular, Twisted's Deferred objects can be quite helpful.
Thoughts on OAuth
I've been playing with OAuth a bit lately. The OAuth specification fulfills a role that some people saw as a failing of OpenID: programmatic access to websites and authenticated web services. The expectation that OpenID would handle these cases seems a bit misguided since the two uses cases are quite different:
- OpenID is designed on the principle of letting arbitrary OpenID providers talk to arbitrary relying parties and vice versa.
- OpenID is intentionally vague about how the provider authenticates the user. The only restriction is that the authentication must be able to fit into a web browsing session between the user and provider.
While these are quite useful features for a decentralised user authentication scheme, the requirements for web service authentication are quite different:
Django support landed in Storm
Since my last article on integrating Storm with Django, I've merged my changes to Storm's trunk. This missed the 0.13 release, so you'll need to use Bazaar to get the latest trunk or wait for 0.14.
The focus since the last post was to get Storm to cooperate with Django's built in ORM. One of the reasons people use Django is the existing components that can be used to build a site. This ranges from the included user management and administration code to full web shop implementations. So even if you plan to use Storm for your Django application, your application will most likely use Django's ORM for some things.
Transaction Management in Django
In my previous post about Django, I mentioned that I found the transaction handling strategy in Django to be a bit surprising.
Like most object relational mappers, it caches information retrieved from the database, since you don't want to be constantly issuing SELECT queries for every attribute access. However, it defaults to commiting after saving changes to each object. So a single web request might end up issuing many transactions:
Change object 1 | Transaction 1 |
Change object 2 | Transaction 2 |
Change object 3 | Transaction 3 |
Change object 4 | Transaction 4 |
Change object 5 | Transaction 5 |
Unless no one else is accessing the database, there is a chance that other users could modify objects that the ORM has cached over the transaction boundaries. This also makes it difficult to test your application in any meaningful way, since it is hard to predict what changes will occur at those points. Django does provide a few ways to provide better transactional behaviour.
Storm 0.13
Yesterday, Thomas rolled the 0.13 release of Storm, which can be downloaded from Launchpad. Storm is the object relational mapper for Python used by Launchpad and Landscape, so it is capable of supporting quite large scale applications. It is seven months since the last release, so there is a lot of improvements. Here are a few simple statistics:
0.12 | 0.13 | Change | |
---|---|---|---|
Tarball size (KB) | 117 | 155 | 38 |
Mainline revisions | 213 | 262 | 49 |
Revisions in ancestry | 552 | 875 | 323 |
So it is a fairly significant update by any of these metrics. Among the new features are:
Using Storm with Django
I've been playing around with Django a bit for work recently, which has been interesting to see what choices they've made differently to Zope 3. There were a few things that surprised me:
- The ORM and database layer defaults to autocommit mode rather than using transactions. This seems like an odd choice given that all the major free databases support transactions these days. While autocommit might work fine when a web application is under light use, it is a recipe for problems at higher loads. By using transactions that last for the duration of the request, the testing you do is more likely to help with the high load situations.
- While there is a middleware class to enable request-duration transactions, it only covers the database connection. There is no global transaction manager to coordinate multiple DB connections or other resources.
- The ORM appears to only support a single connection for a request. While this is the most common case and should be easy to code with, allowing an application to expand past this limit seems prudent.
- The tutorial promotes schema generation from Python models, which I feel is the wrong choice for any application that is likely to evolve over time (i.e. pretty much every application). I've written about this previously and believe that migration based schema management is a more workable solution.
- It poorly reinvents thread local storage in a few places. This isn't too surprising for things that existed prior to Python 2.4, and probably isn't a problem for its default mode of operation.
Other than these things I've noticed so far, it looks like a nice framework.
How not to do thread local storage with Python
The Python standard library contains a
function called thread.get_ident()
. It will return an integer that
uniquely identifies the current thread at that point in time. On most
UNIX systems, this will be the pthread_t
value returned by
pthread_self()
. At first look, this might seem like a good value to
key a thread local storage dictionary with. Please don't do that.
The value uniquely identifies the thread only as long as it is running. The value can be reused after the thread exits. On my system, this happens quite reliably with the following sample program printing the same ID ten times:
Psycopg migrated to Bazaar
Last week we moved psycopg from Subversion to Bazaar. I did the migration using Gustavo Niemeyer's svn2bzr tool with a few tweaks to map the old Subversion committer IDs to the email address form conventionally used by Bazaar.
The tool does a good job of following tree copies and create related Bazaar branches. It doesn't have any special handling for stuff in the tags/ directory (it produces new branches, as it does for other tree copies). To get real Bazaar tags, I wrote a simple post-processing script to calculate the heads of all the branches in a tags/ directory and set them as tags in another branch (provided those revisions occur in its ancestry). This worked pretty well except for a few revisions synthesised by a previous cvs2svn migration. As these tags were from pretty old psycopg 1 releases I don't know how much it matters.
Psycopg2 2.0.7 Released
Yesterday Federico released version 2.0.7 of psycopg2 (a Python database adapter for PostgreSQL). I made a fair number of the changes in this release to make it more usable for some of Canonical's applications. The new release should work with the development version of Storm, and shouldn't be too difficult to get everything working with other frameworks.
Some of the improvements include:
- Better selection of exceptions based on the SQLSTATE result field. This causes a number of errors that were reported as ProgrammingError to use a more appropriate exception (e.g. DataError, OperationalError, InternalError). This was the change that broke Storm's test suite as it was checking for ProgrammingError on some queries that were clearly not programming errors.
- Proper error reporting for commit() and rollback(). These methods now use the same error reporting code paths as execute(), so an integrity error on commit() will now raise IntegrityError rather than OperationalError.
- The compile-time switch that controls whether the display_size member of Cursor.description is calculated is now turned off by default. The code was quite expensive and the field is of limited use (and not provided by a number of other database adapters).
- New QueryCanceledError and TransactionRollbackError exceptions. The first is useful for handling queries that are canceled by statement_timeout. The second provides a convenient way to catch serialisation failures and deadlocks: errors that indicate the transaction should be retried.
- Fixes for a few memory leaks and GIL misuses. One of the leaks was in the notice processing code that could be particularly problematic for long-running daemon processes.
- Better test coverage and a driver script to run the entire test suite in one go. The tests should all pass too, provided your database cluster uses unicode (there was a report just before the release of one test failing for a LATIN1 cluster).
If you're using previous versions of psycopg2, I'd highly recommend upgrading to this release.
Running Valgrind on Python Extensions
As most developers know, Valgrind is an invaluable tool for finding memory leaks. However, when debugging Python programs the pymalloc allocator gets in the way.
There is a Valgrind suppression file distributed with Python that gets rid of most of the false positives, but does not give particularly good diagnostics for memory allocated through pymalloc. To properly analyse leaks, you often need to recompile Python with pymalloc.
As I don't like having to recompile Python I took a look at Valgrind's client API, which provides a way for a program to detect whether it is running under Valgrind. Using the client API I was able to put together a patch that automatically disables pymalloc when appropriate. It can be found attached to bug 2422 in the Python bug tracker.
Two‐Phase Commit in Python's DB‐API
Marc uploaded a new revision of the Python DB-API 2.0 Specification yesterday that documents the new two phase commit extension that I helped develop on the db-sig mailing list.
My interest in this started from the desire to support two phase commit in Storm – without that feature there are far fewer occasions where its ability to talk to multiple databases can be put to use. As I was doing some work on psycopg2 for Launchpad, I initially put together a PostgreSQL specific patch, which was (rightly) rejected by Federico.
Zeroconf Branch Sharing with Bazaar
At Canonical, one of the approaches taken to accelerate development is to hold coding sprints (otherwise known as hackathons, hackfests or similar). Certain things get done a lot quicker face to face compared to mailing lists, IRC or VoIP.
When collaborating with someone at one of these sprints the usual way to let others look at my work would be to commit the changes so that they could be pulled or merged by others. With legacy version control systems like CVS or Subversion, this would generally result in me uploading all my changes to a server in another country only for them to be downloaded back to the sprint location by others.
Re: Python factory-like type instances
Nicolas:
Your metaclass example is a good example of when not to use metaclasses.
I wouldn't be surprised if it is executed slightly different to how you
expect. Let's look at how Foo
is evaluated, starting with what's
written:
class Foo:
__metaclass__ = FooMeta
This is equivalent to the following assignment:
Foo = FooMeta('Foo', (), {...})
As FooMeta
has an __new__()
method, the attempt to instantiate
FooMeta
will result in it being called. As the return value of
__new__()
is not a FooMeta
instance, there is no attempt to call
FooMeta.__init__()
. So we could further simplify the code to:
urlparse considered harmful
Over the weekend, I spent a number of hours tracking down a bug caused
by the cache in the Python urlparse
module. The problem
has already been reported as Python bug
1313119, but has not been fixed
yet.
First a bit of background. The urlparse
module does what you'd expect
and parses a URL into its components:
>>> from urlparse import urlparse
>>> urlparse('http://www.gnome.org/')
('http', 'www.gnome.org', '/', '', '', '')
As well as accepting byte strings (which you'd be using at the HTTP protocol level), it also accepts Unicode strings (which you'd be using at the HTML or XML content level):
Storm Released
This week at the EuroPython conference, Gustavo Niemeyer announced the release of Storm and gave a tutorial on using it.
Storm is a new object relational mapper for Python that was developed for use in some Canonical projects, and we've been working on moving Launchpad over to it. I'll discuss a few of the nice features of the package:
Loose Binding Between Database Connections and Classes
Storm has a much looser binding between database connections and the classes used to represent records in particular tables. The standard way of querying the database uses a store object:
ZeroConf support for Bazaar
When at conferences and sprints, I often want to see what someone else is working on, or to let other people see what I am working on. Usually we end up pushing up to a shared server and using that as a way to exchange branches. However, this can be quite frustrating when competing for outside bandwidth when at a conference.
It is possible to share the branch from a local web server, but that still means you need to work out the addressing issues.
Python time.timezone / time.altzone edge case
While browsing the log of one of my Bazaar branches, I noticed that the commit messages were being recorded as occurring in the +0800 time zone even though WA switched over to daylight savings.
Bazaar stores commit dates as a standard UNIX seconds since epoch value
and a time zone offset in seconds. So the problem was with the way that
time zone offset was recorded. The code in bzrlib
that calculates the
offset looks like this:
Recovering a Branch From a Bazaar Repository
In my previous entry, I mentioned that Andrew was actually publishing the contents of all his Bazaar branches with his rsync script, even though he was only advertising a single branch. Yesterday I had a need to actually do this, so I thought I'd detail how to do it.
As a refresher, a Bazaar repository stores the revision graph for the ancestry of all the branches stored inside it. A branch is essentially just a pointer to the head revision of a particular line of development. So if the branch has been deleted but the data is still in the repository, recovering it is a simple matter of discovering the identifier for the head revision.
UTC+9
Daylight saving started yesterday: the first time since 1991/1992 summer
for Western Australia. The legislation finally passed the upper house on
21st November (12 days before the transition date). The updated
tzdata
packages were released on 27th
November (6 days before the transition). So far, there hasn't been an
updated package released for Ubuntu (see bug
72125).
One thing brought up in the Launchpad bug was that not all applications
used the system /usr/share/zoneinfo
time zone database. So other
places that might need updating include:
Playing Around With the Bluez D-BUS Interface
In my previous entry about using the
Maemo obex-module
on the desktop, Johan Hedberg mentioned that
bluez-utils
3.7 included equivalent interfaces to the
osso-gwconnect
daemon used by the method. Since then, the copy of
bluez-utils
in Edgy has been updated to 3.7, and the necessary
interfaces are enabled in hcid
by default.
Before trying to modify the VFS code, I thought I'd experiment a bit
with the D-BUS interfaces via the D-BUS python bindings. Most of the
interesting method calls exist on the org.bluez.Adapter
interface. We
can easily get the default adapter with the following code:
Launchpad enterered into Python bug tracker competition
The Python developers have been looking for a new bug tracker, and essentially put out a tender for people interested in providing a bug tracker. Recently I have been working on getting Launchpad's entry ready, which mainly involved working on SourceForge import.
The entry is now up, and our demonstration server is up and running with a snapshot of the Python bug tracker data.
As a side effect of this, we've got fairly good SourceForge tracker import support now, which we should be able to use if other projects want to switch away from SF.
Re: Lazy loading
Emmanuel: if you are using a language like Python, you can let the language keep track of your state machine for something like that:
def load_items(treeview, liststore, items):
for obj in items:
liststore.append((obj.get_foo(),
obj.get_bar(),
obj.get_baz()))
yield True
treeview.set_model(liststore)
yield False
def lazy_load_items(treeview, liststore, items):
gobject.idle_add(load_items(treeview, liststore, item).next)
Here, load_items()
is a generator that will iterate over a sequence
like [True, True, ..., True, False]
. The next()
method is used to
get the next value from the iterator. When used as an idle function
with this particular generator, it results in one item being added to
the list store per idle call til we get to the end of the generator
body where the "yield False
" statement results in the idle
function being removed.
pygpgme 0.1 released
Back in January I started working on a new Python wrapper for the GPGME library. I recently put out the first release:
This library allows you to encrypt, decrypt, sign and verify messages in the OpenPGP format, using gpg as the backend. In general, it stays fairly close to the C API with the following changes:
- Represent C structures as Python classes where appropriate (e.g. contexts, keys, etc). Operations on those data types are converted to methods.
- The
gpgme_data_t
type is not exposed directly. Instead, any Python object that looks like a file object can be passed (including StringIO objects). - In cases where there are
gpgme_op_XXXX()
andgpgme_op_XXXX_result()
function pairs, these have been replaced by a singlegpgme.Context.XXXX()
method. Errors are returned in the exception where appropriate. - No explicit memory management. As expected for a Python module, memory management is automatic.
The module also releases the global interpreter lock over calls that fork gpg subprocesses. This should make the module multithread friendly.
Python class advisors
Anyone who has played with Zope 3 has probably seen the syntax used to declare what interfaces a particular class implements. It looks something like this:
class Foo:
implements(IFoo, IBar)
...
This leads to the following question: how can a function call inside a class definition's scope affect the resulting class? To understand how this works, a little knowledge of Python metaclasses is needed.
Metaclasses
In Python, classes are instances of metaclasses. For new-style
classes, the default metaclass is type
(which happens to be its own
metaclass). When you create a new class or subclass, you are creating
a new instance of the metaclass. The constructor for a metaclass takes
three arguments: the class's name, a tuple of the base classes and a
dictionary attributes and methods. So the following two definitions of
the class C
are equivalent:
Version control discussion on the Python list
The Python developers have been discussing a migration off CVS on the python-dev mailing list. During the discussion, Bazaar-NG was mentioned. A few posts of note:
- Mark Shuttleworth provides some information on the Bazaar roadmap. Importantly, Bazaar-NG will become Bazaar 2.0.
- Steve Alexander describes how we use Bazaar to develop Launchpad. This includes a description of the branch review process we use to integrate changes into the mainline.
I'm going to have to play around with bzr
a bit more, but it looks
very nice (and should require less typing than baz
...)
Overriding Class Methods in Python
One of the features added back in Python 2.2 was class methods. These differ from traditional methods in the following ways:
- They can be called on both the class itself and instances of the class.
- Rather than binding to an instance, they bind to the class. This means that the first argument passed to the method is a class object rather than an instance.
For most intents and purposes, class methods are written the same way as normal instance methods. One place that things differ is overriding a class method in a subclass. The following simple example demonstrates the problem:
Python Challenge
Found out about The Python Challenge. While you don't need to use Python to solve most of the problems, a knowledge of the language certainly helps. While the initial problems are fairly easy, some of the later ones are quite difficult, and cover many topics.
If you decide to have a go, here are a few hints that might help:
- Keep a log of what you do. Solutions to may provide insight into subsequent problems.
- Look at ALL the information provided to you. If the solution isn't apparent, look for patterns in the information and extrapolate.
- If you are using brute force to solve a problem, there is probably a quicker and simpler method to get the answer.
- If you get stuck, check the forum for hints.
There is also a solutions wiki, however, you need to have solved the corresponding problem before it will give you access.
8 April 2005
Tracing Python Programs
I was asked recently whether there was an equivalent of sh -x
for
Python (ie. print out each statement before it is run), to help with
debugging a script. It turns out that there is a module in the Python
standard library to do so, but it isn't listed in the standard library
reference for some reason.
To use it, simply run the program like this:
/usr/lib/python2.4/trace.py -t program.py
This'll print out the filename, line number and contents of that line
before executing the code. If you want to skip the output for the
standard library (ie. only show statements from your own code), simply
pass --ignore-dir=/usr/lib/python2.4
(or similar) as an option.
Python Unicode Weirdness
While discussing unicode on IRC with owen, we ran into a peculiarity in Python's unicode handling. It can be tested with the following code:
>>> s = u'\U00010001\U00010002' >>> len(s) >>> s[0]
Python can be compiled to use either 16-bit or 32-bit widths for
characters in its unicode strings (16-bit being the default). When
compiled in 32-bit mode, the results of the last two statements are 2
and u'\U00010001'
respectively. When compiled in 16-bit mode, the
results are 4
and u'\ud800'
.