POSKeyError During Commit

A terrifying tale of ZODB errors that only showed during the hardest time to debug.

Believe it or not, I’m a POSKeyError virgin, I’ve never run into one… Until now!

I was working on a very small Plone 3.1.1 to 4.2.2 upgrade. Not too many add-ons, simple skin/theme customizations, 1,810 objects in ~800MB. No problem, right?

Wrong. I got a POSKeyError after the upgrade step that converts files and images to BLOBs. The wierd part is that it only happened during the transaction.commit(), not during ZODB object access:

  File "/opt/src/eggs/Zope2-2.13.18-py2.7.egg/Zope2/App/startup.py", line 301, in commit
  File "/opt/src/eggs/transaction-1.1.1-py2.7.egg/transaction/_manager.py", line 89, in commit
    return self.get().commit()
  File "/opt/src/eggs/transaction-1.1.1-py2.7.egg/transaction/_transaction.py", line 329, in commit
  File "/opt/src/eggs/transaction-1.1.1-py2.7.egg/transaction/_transaction.py", line 443, in _commitResources
  File "/opt/src/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/Connection.py", line 572, in commit
    oid, serial, transaction)
  File "/opt/src/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/BaseStorage.py", line 416, in checkCurrentSerialInTransaction
    committed_tid = self.getTid(oid)
  File "/opt/src/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/FileStorage/FileStorage.py", line 770, in getTid
    pos = self._lookup_pos(oid)
  File "/opt/src/eggs/ZODB3-3.10.5-py2.7-linux-x86_64.egg/ZODB/FileStorage/FileStorage.py", line 403, in _lookup_pos
    raise POSKeyError(oid)
POSKeyError: 0x055350

I tried all the tools in ZODB.scripts and zc.zodbdgc but they couldn’t tell me anything about the oid in question. That’s probably because it was a different oid every time I attempted the upgrade step on the same ZODB.

Without being able to use the typical tools to find out what ZODB objects were referring to those oids I was really stuck. Then David Glick came to my rescue. Actually, he was staying at my apartment so I drafted him. :-) He suggested that I might have to create a mapping of oids to stack frames at the time when objects were added to the transaction so that I could figure out what had been going on once I knew which oid would be the problem. This worked very well and much thanks to David.

I added the following to the place where the objects were being added to the transaction, ZODB.Connection.Connection.readCurrent():

if not hasattr(self, 'XXX'):
    self.XXX = {}
self.XXX[ob._p_oid] = sys._getframe()

Then once I had my POSKeyError in pdb.post_mortem() I could invoke pdb at the place where the offending oid had been added:

import pdb; pdbr = pdb.Pdb() ; pdbr.reset() ; pdbr.interaction(self.XXX[oid], None)

Using that I was able to find out what objects were involved. They were all portal_catalog/path index Trees.IIBTree.IITreeSet objects. What’s odd is that the error happens even if I do a catalog clear and rebuild first.

After much more debugging and discussion, we determined that the ZODB.Connection.Connection.readCurrent() support was involved somehow. Also, as I deleted the images that were involved I noticed another migration error earlier on went away for each image I removed. That’s the key, when the BLOB migration fails for one file or image, it uses a savepoint to rollback just that image’s migration and proceed with the rest. It sure seems like there’s a bug in the ZODB where when a savepoint is rolled back the objects registered with the transaction are cleaned up but the objects registered for readCurrent are not. Gee, someone should write a test case. :-)

In the meantime, it turns out the reason the image migrations were failing was something else all together but I thought I should share this debugging technique for capturing stack information when you can’t know which stack you need to inspect until later. I also hope this ZODB bug gets squashed.

Updated on 21 November 2012

Imported from Plone on Mar 15, 2021. The date for this update is the last modified date in Plone.


comments powered by Disqus