Tjelvar Olsson     About     Posts     Feed     Newsletter

Strategies to access content from Python functions that write to disk

Have you ever worked with an API that has some sort of “save to file” function only to find yourself wanting a function that returns the content to a string? For example the Python image module skimage.io has a function named imsave that takes fname and arr as arguments and writes an image to disk. However, what I wanted was a function that returned the image as a byte string. In other words I wanted the behaviour of the Python Image Library’s PIL.Image.tobytes function. However, I could not find one in scikit-image.

Strategy 1: make use of StringIO

In these types of circumstances one can often make use of Python’s built-in StringIO module. Let’s illustrate this using PIL.

>>> import numpy as np
>>> from PIL import Image
>>> from StringIO import StringIO
>>> ar = np.zeros((50,50), dtype=np.uint8)  # The array we want to get a png byte string for.
>>> img = Image.fromarray(ar)
>>> img = img.convert('RGB')  # Need to convert to RGB to save as PNG.
>>> output = StringIO()
>>> img.save(output, format="PNG")
>>> contents = output.getvalue()
>>> output.close()
>>> assert(isinstance(contents, bytes))

Strategy 2: write, read, delete

However, one cannot use the approach above with skimage.io.imsave as it does not provide a means to specify the format (the format seems to be “automagically” determined from the file name). So we are forced to save the image to disk and then read the contents of the file.

>>> import os
>>> from skimage.io import imsave
>>> imsave('tmp.png', ar)
>>> contents = open('tmp.png', 'rb').read()
>>> os.unlink('tmp.png')
>>> assert(isinstance(contents, bytes))

Strategy 3: create a context manager

The code above above is really ugly. What we want is something that can give us a relatively safe temporary file path and delete it once we are done with it. This is what Python’s context managers are for. Context managers are what lets you use the with statement for opening files etc. Jeff Preshing has written a nice tutorial on context mangers The Python “with” Statement by Example.

Here I will use a test driven development (TDD) approach to illustrate how we can implement a context manager to help us work more safely with temporary file paths. So, before we start working on an implementation let us specify the desired behaviour as a test. Add the code below to a file named tempfilepath.py.

if __name__ == "__main__":
    import os.path
    fpath = None
    with TemporaryFilePath() as tmp:
        assert(os.path.isfile(tmp.fpath))
        with open(tmp.fpath, 'w') as fh:
            fh.write('Testing opening and writing...')
        fpath = tmp.fpath
    assert(not os.path.isfile(fpath))

The code above will raise a NameError stating that the TemporaryFilePath is not defined. Great, now we can start adding an implementation to make the tests pass. I will do this incrementally as it is a useful illustration of some of the aspects of TDD.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

if __name__ == "__main__":
    import os.path
    fpath = None
    with TemporaryFilePath() as tmp:
        assert(os.path.isfile(tmp.fpath))
        with open(tmp.fpath, 'w') as fh:
            fh.write('Testing opening and writing...')
        fpath = tmp.fpath
    assert(not os.path.isfile(fpath))

We now get the error message below.

Traceback (most recent call last):
  File "tempfilepath.py", line 7, in <module>
    with TemporaryFilePath() as tmp:
AttributeError: __exit__

In true TDD style let us add a minimal implementation to make the test pass.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __exit__(self, type, value, tb):
        pass

if __name__ == "__main__":
    import os.path
    fpath = None
    with TemporaryFilePath() as tmp:
        assert(os.path.isfile(tmp.fpath))
        with open(tmp.fpath, 'w') as fh:
            fh.write('Testing opening and writing...')
        fpath = tmp.fpath
    assert(not os.path.isfile(fpath))

The implementation now gives the error below.

Traceback (most recent call last):
  File "tempfilepath.py", line 10, in <module>
    with TemporaryFilePath() as tmp:
AttributeError: __enter__

Let us add the minimal implementation to fix this.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __enter__(self):
        pass

    def __exit__(self, type, value, tb):
        pass

Which reveals the error below.

Traceback (most recent call last):
  File "tempfilepath.py", line 14, in <module>
    assert(os.path.isfile(tmp.fpath))
AttributeError: 'NoneType' object has no attribute 'fpath'

Can you work out what we need to do to fix this? This is a bit subtle, and it caught me out. The clue is that the tmp variable is NoneType, whereas it should have been TemporaryFilePath. This is due to the __enter__ function not returning anything. Let us fix it.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        pass

Now the context manager returns the object type we expect.

Traceback (most recent call last):
  File "tempfilepath.py", line 14, in <module>
    assert(os.path.isfile(tmp.fpath))
AttributeError: 'TemporaryFilePath' object has no attribute 'fpath'

Time to add the fpath attribute.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __init__(self):
        self.fpath = 'tmp.txt'

    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        pass

Now we are starting to get to the centre of the desired functionality.

Traceback (most recent call last):
  File "tempfilepath.py", line 17, in <module>
    assert(os.path.isfile(tmp.fpath))
AssertionError

At this stage we just want to get the tests to pass so we add a “dumb” implementation.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __init__(self):
        self.fpath = 'tmp.txt'
        with open(self.fpath, 'w') as fh:
            pass

    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        pass

Which gets us to the second assertion statement.

Traceback (most recent call last):
  File "tempfilepath.py", line 23, in <module>
    assert(not os.path.isfile(fpath))
AssertionError

Basically, we need to add some clean up functionality to the __exit__ function.

import os

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __init__(self):
        self.fpath = 'tmp.txt'
        with open(self.fpath, 'w') as fh:
            pass

    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        os.unlink(self.fpath)

And that makes all the tests pass. However, this code still has the ugly side-effect of hijacking the tmp.txt file. It is time to refactor the code to make it less nasty. Let us make use of the tempfile module.

import os
import tempfile

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __init__(self):
        with tempfile.NamedTemporaryFile(delete=False) as tmp:
            self.fpath = tmp.name

    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        os.unlink(self.fpath)

Great, everything is working nicely. The only problem is that in order to be able to save an image in png file format we need to be able to specify the suffix of the file name.

At this stage it is very tempting to simply add the desired functionality and it is where test driven development really requires discipline. Let us be good practitioners of TDD and add a test specifying the desired behaviour first.

if __name__ == "__main__":
    import os.path
    fpath = None
    with TemporaryFilePath() as tmp:
        assert(os.path.isfile(tmp.fpath))
        with open(tmp.fpath, 'w') as fh:
            fh.write('Testing opening and writing...')
        fpath = tmp.fpath
    assert(not os.path.isfile(fpath))
        
    with TemporaryFilePath(suffix='.png') as tmp:
        assert(tmp.fpath.endswith('.png'))
        

Great, we now have a failing test.

Traceback (most recent call last):
  File "tempfilepath.py", line 27, in <module>
    with TemporaryFilePath(suffix='.png') as tmp:
TypeError: __init__() got an unexpected keyword argument 'suffix'

Let us continue to work incrementally and only fix the error reported.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __init__(self, suffix=None):
        with tempfile.NamedTemporaryFile(delete=False) as tmp:
            self.fpath = tmp.name

This moves us on to the actual assertion that we wanted to test.

$ python tempfilepath.py
Traceback (most recent call last):
  File "tempfilepath.py", line 28, in <module>
    assert(tmp.fpath.endswith('.png'))
AssertionError

Let us try to fix it.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __init__(self, suffix=None):
        with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
            self.fpath = tmp.name

    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        os.unlink(self.fpath)

However, this results in a horrible error message.

Traceback (most recent call last):
  File "tempfilepath.py", line 20, in <module>
    with TemporaryFilePath() as tmp:
  File "tempfilepath.py", line 8, in __init__
    with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tempfile.py", line 462, in NamedTemporaryFile
    (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tempfile.py", line 237, in _mkstemp_inner
    file = _os.path.join(dir, pre + name + suf)
TypeError: cannot concatenate 'str' and 'NoneType' objects

The error message above is basically trying to tell us that the suffix argument should not be None by default. We can verify this by looking at the tempfile.NamedTemporaryFile documentation, which states that it should be an empty string. Let us fix our code.

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __init__(self, suffix=''):
        with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
            self.fpath = tmp.name

    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        os.unlink(self.fpath)

And now all the tests pass. Below is the code in all its glory.

import os
import tempfile

class TemporaryFilePath(object):
    """Context manager for handling temporary file paths."""

    def __init__(self, suffix=''):
        with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
            self.fpath = tmp.name

    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        os.unlink(self.fpath)

if __name__ == "__main__":
    import os.path
    fpath = None
    with TemporaryFilePath() as tmp:
        assert(os.path.isfile(tmp.fpath))
        with open(tmp.fpath, 'w') as fh:
            fh.write('Testing opening and writing...')
        fpath = tmp.fpath
    assert(not os.path.isfile(fpath))
        
    with TemporaryFilePath(suffix='.png') as tmp:
        assert(tmp.fpath.endswith('.png'))

We can now use this to get the content of our numpy array as an image in byte string representation.

>>> from tempfilepath import TemporaryFilePath
>>> with TemporaryFilePath(suffix='.png') as tmp:
... 	imsave(tmp.fpath, ar)
...     content = open(tmp.fpath, 'rb').read()
...
>>> assert(isinstance(content, bytes))