Pincus Lab

Washington University School of Medicine

Department of Developmental Biology
and Department of Genetics

Celltool is a collection of tools for analysis of shapes from images, and in particular, for analyzing cell shapes from micrographs. Celltool provides methods for extracting shapes from images, aligning and measuring those shapes, plotting shapes, and statistically comparing distributions (usually of shape measurements). Celltool is open-source (GPL).

To refer to Celltool in a publication, please cite:

Pincus Z & Theriot JA. Comparison of quantitative methods for cell-shape analysis. J Microscopy 227 pp. 140–156 (2007).

Celltool comprises the following functionality:

  1. Extraction of polygonal contours from images. These contours are the fundamental data produced and consumed by the tools that make up celltool. Contours are extracted as intensity iso-lines from a given image (i.e. by thresholding). If the images are unsuitable for thresholding, otherwise-derived (e.g. manually-defined or generated by an advanced segmentation algorithm) binary masks that already define the individual shapes can be provided. There is no limit to the number of discrete shapes that can be extracted per image.

  2. Scaling, alignment, and re-sampling of polygonal contours. The contours, originally defined in pixel units, can be scaled into their natural length units (e.g. microns) for further analysis. Contours can also be aligned along their long axes, and can be re-sampled (via spline interpolation) such that each contour in a population is a polygon defined by the same number of vertices.

  3. Mutual alignment of contours. Celltool can align a population of contours in space so that corresponding parts of the each contour's shape are aligned as best as possible. (This works best with shapes that have stereotyped asymmetries, like front/back or left/right.) Simultaneously, the ordering of the vertices of the polygons are aligned as well, so that corresponding parts of each contour's shape have corresponding vertex numbers. That is, once alignment is finished, "point 100" will represent roughly the same position on each contour.

  4. Principal components analysis of population shape variability. Once a population of contours are aligned, a process which removes variability due to the pose and position of the contours relative to one another, the remaining variation in their shape represents the "intrinsic" spread of shapes: the space of possible shapes taken up by that population. To compactly summarize this space, Celltool can perform the principal components analysis to decompose this space into a basis set of orthogonal "shape modes", ranked by their order of importance (such a set of modes is referred to by the software as a "shape model"). These shape modes are easy to plot and understand intuitively. Moreover, such modes can be used as a quantitative measure of shape: a given contour's position along several shape modes is a numerical descriptor that can be of great statistical use.

  5. Shape measurement. Celltool can measure a given shape's position along the modes of a provided shape model, as well as other simple measures of shape like area, aspect ratio, perimeter, and "smoothness". In addition, CellTool can perform image-based measurements like calculating the average image intensity within a certain region of an image as defined by the position of the contour. (Example: suppose one has a fluorescence micrograph of the distribution of a certain protein and wishes to measure the average amount of protein only at the front of a set of polarized cells. By converting the cells to contours and aligning them, the user can then direct celltool to provide the "average image intensity for each contour only in the polygon region corresponding to the front of the cell, from the cell border to a depth of one micron" or similar.)

  6. High-quality plotting. Celltool can produce SVG files (easily editable in Adobe Illustrator and the free Inkscape software, amongst others) showing contour shapes, the shape modes from a shape model, and plotting actual shapes against their measured position along those shape modes (or along any arbitrary numerical measure).

  7. Shape-based image manipulation. CellTool can also extract, align, and mask out regions of other images based on the shape of the polygonal contours.

Note well that celltool is a command-line program, with no "graphic user interface". This is, as they say, a feature, not a bug: because these analysis steps are often performed repetitively different data-sets, celltool is written for a text-mode interface, which makes it easy to script and automate. In addition, lists of text commands can be saved to provide a specific record of the analysis performed.

Installation

Files:

Celltool is written in the Python programming language and makes heavy use of the NumPy and SciPy numerical-computing tools. Python is an excellent, simple-to-learn language, and with NumPy, rivals matlab as a language for sicentific computing. If you are a Python user (or wish to become one), celltool can be compiled from source and installed into your current python distribution. (You'll just need Python, NumPy, and C/C++/Fortran compilers.) Celltool is compatible with both Python versions 2 and 3. For Windows, there is also a self-contained, double-clickable installer.

Binary Installation on Windows

Download the installer and run it. The celltool program will now be available from your command line. (About which more below.)

Simplified Installation on Mac OS X

(Linux users can do the same, but substitute their package manager for brew below.)

  1. Install the homebrew package manager.
  2. Make sure that the full set of developer tools and libraries is installed: in the terminal, type
    xcode-select --install
  3. Install Python: in the terminal, type brew install python (if you want to use Python 3, substitute python3 for python here and in all commands below).
  4. Use the python package manager to install NumPy and SciPy: in the terminal, type pip install numpy scipy
  5. Download and unzip the celltool source code folder, and from inside the folder run python setup.py install in the terminal. The celltool command should be available after you open a new terminal window.

Installation from Source

This requires:

  1. Python (version 2 or 3 supported)
  2. NumPy
  3. SciPy
  4. Working C/C++ compilers.

Now, the instructions:

  1. Make sure that you have a working version of Python. Open a terminal (Mac/Linux/Unix, hereafter referred to as *nix) or command-prompt (Windows) window, and type: python --version If there is no "command not found" error and the version is at least 2.3, then all is well. Otherwise, visit the Python download page to obtain either the source code to the latest version of the Python language, or a binary installer for the same. The package manager from most Linux systems should allow you to easily fetch and install python with little fuss.

  2. Make sure you have NumPy and SciPy, the basic numerical and scientific tools for Python. Open a terminal and type: python Now, at the >>> prompt, type import numpy; import scipy. If there is an error, visit the NumPy and SciPy download page to obtain either the source code to the latest version, or a binary installer for the same. (Or use a package manager on Linux.) Follow the instructions there for building and installing Numpy and SciPy.

  3. Build Celltool. Download and extract the celltool source code. Open a terminal and cd to the celltool directory, and then type python setup.py build. If this succeeds, then type sudo python setup.py install to install CellTool for all users on the computer. By default on *nix, celltool tries to place the program scripts in /usr/local/bin; if you prefer them to be located elsewhere, use: sudo python setup.py install --install-scripts=/path/to/install/location. Note that on a Macintosh, /usr/local/bin is not on the PATH by default; thus when you type celltool, the program will not be found. To remedy this, either run /usr/local/bin/celltool, or add /usr/local/bin to the PATH environmental variable. (If your shell is bash, which it is by default on OS X, the following will fix things for your user account: cd ~; echo 'PATH=/usr/local/bin:$PATH ; export PATH' >> .bashrc; please however read more about basic shell usage to understand what this is doing before executing the commands! There are many tutorials on the internet.)

  4. Use celltool. Open a new terminal and type celltool --help. If all worked out properly, celltool should run and display a short help message.

Tutorial

This description assumes that you are familiar with the command-line interface on your computer. If not, please see my brief tutorial. The celltool suite of shape analysis tools is accessed through the command celltool. To see a list of available actions, type celltool --help; to see help for any particular action, type celltool command --help (replacing "command" with the particular action you want to know more about).

The basic workflow is the same for any use of celltool:

  1. Convert images to sets of polygonal contours, one for each cell in the image. Optionally resample the contours so that each has the same number of vertices.

  2. Align the contours.

  3. Make measurements of the individual contours, or of the shape space which the population inhabits (via PCA).

  4. Plot individual contours, the PCA shape space, or the results of various measurements thereof.

First, download the tutorial files, decompress them, and navigate to that directory from the command prompt. There's a directory called "Binary" which contain images of Caulobacter crescentus (already converted to binary images via thresholding, and non-bacterial image blobs manually removed). In particular, there are several images for each of four conditions: bacteria with FtsZ knocked down, MreB knocked down, the double-knockdown, and wild-type. We'd like to compare the shapes of these bacteria under various conditions. Note that the microns-per-pixel scale factor for these images is 0.0680209.

So, step one is to get some polygonal contours for the cells from the binary images. To do so, type (all on one line):

celltool extract_contours --min-area=30 --scale=0.0680209 --units=microns 
--resample-points=100 --smoothing-factor=0.001 --destination=Contours Binary/*.png

(If on Windows, replace the forward-slash path separator with a back-slash...) This directs celltool to extract the contours from all of the PNG files in the "Binary" directory, discarding as junk blobs of less than 30 pixels. Celltool then scales the polygons into micron units and resamples each to have exactly 100 points. The resampling is performed with a spline, and we can control the level of smoothing that the spline performs (we want some smoothing to get rid of the high-frequency square pixel edges). This smoothing factor is in the spatial units of the contours (in our case, microns), and in general, the amount of smoothing should be on the order of 1/100 of the dimension of the cells. (Though be sure to experiment with different amounts of smoothing!) Finally, the processed contours will be written out into a directory called "Contours" (which will be created if it does not exist), and given names derived from their original image. On the Mac, you'll get a pretty little progress bar; on the PC a scrolling list of files processed.

Also, note that this command, as with every other shown below, is capable of much more than shown. Make sure to run celltool extract_contours --help (and similar for the other commands) to see explanation about the other possible options!

Once this command has finished, let's plot the some of resulting contours to a SVG file (which can be opened and edited in Illustrator or Inkscape, and can also be viewed in Safari or Firefox):

celltool plot_contours --output-file="Extracted Contours.svg" Contours/WildType06-*.contour
            

This tells celltool to plot all of the contours from in the image "WildType06" we extracted into the "Contours" directory into a file named "Extracted Contours.svg". Opening up the file, we see each bacteria, outlined in a different color, in its original position in the image. (Except the positions have been reflected across the horizontal, because the typical image coordinate system starts at the top left, while the most natural coordinate system for plotting starts at the bottom left.)

If we want to measure the shape variability of the contours, the current situation is no good: the major differences between each contour so far have to do with position and orientation, not intrinsic shape! So we must align the contours to one another. We could do this in several ways:

  1. Mutually align all of the contours to one another.
  2. Mutually align some subset of the contours, make a PCA "shape model" from that subset, and align the other contours to the mean shape from that shape model.
  3. Select some "representative" shape and align all the others to that one.

Your choice depends on the context. Here, the context is pedagogical, so we will select option 2, which shows off the most different features of the tools!

Let's first align the wild-type contours to one another. This is an iterative process in which the shapes are translated to the origin, and then the average shape is found. This average is of course a mess, because the polygons are all oriented differently. Worse, since each polygon is a list of vertices, the ordering of vertices matters: some might have the "first point" near the bacterial pole, while others have it elsewhere. Nevertheless, there will be some small statistical signal in this average. We then use that average to re-align all of the contours (both spatially and in terms of vertex ordering). This accomplished, we can re-compute a new average that will be better than the old one. These two steps are iterated until no contour changes position.

celltool align_contours --allow-reflection --destination=Aligned Contours/WildType*.contour
            

This command selects only the contours derived from images whose names start with "WildType" and mutually aligns them, allowing the shapes to be reflected if that enhances the alignment. (For polarized cells with distinct tops and bottoms, allowing reflection might not be such a good idea...)

Let's plot the output of the alignment. Instead of coloring each contour separately, we'll now color the contours by the point ordering of their vertices:

celltool plot_contours --color-by=points --output-file="Aligned Wild-Type.svg" Aligned/*.contour
            

Not too shabby! From this, we can see more or less what kind of shapes the wild-type cells can obtain. To make this quantitative, let's build a PCA "shape model". This procedure treats each polygon simply as a point in 200-dimensional space (100 vertices, each with an x- and y-coordinate) and finds the directions away from the mean shape which account for the most variance. These directions can then be interpreted as deformations away from the mean shape for easy and intuitive visualization.

celltool shape_model --variance-explained=0.95 --output-prefix="WT Model" Aligned/*.contour
            

This tells celltool to make a shape model with enough "principal shape modes" to account for 95% of the total variance. Several files will be created, all prefixed with "WT Model". The files "WT Model-normalized-positions.csv" and "WT Model-positions.csv" contain the position of each cell with regard to the principal shape modes as (Excel-readable) comma-separated value files. The normalized positions are reported in terms of standard deviations from the mean along each shape mode, which thus disregards the differences in overall variation explained by each mode. (More about this later.) Finally, "WT Model.contour" is also created: this is a special contour file that contains both the mean shape of the wild-type population as well as the principal modes of variation about that mean.

Let's see what these modes look like:

celltool plot_model "WT Model.contour"
            

Which produces a file called "WT Model.svg" (though this can of course be controlled). Opening that, we see the two principal modes of shape deformation, illustrated by the mean shape superimposed on shapes several standard deviations away from that mean along each mode. The first mode, which accounts for 90% of the variability in the wild-type population, is a combined size-and-shape mode that appears to recapitulate the cell cycle (more or less). The second mode describes how bent any individual cell is.

What is the configuration of the individual cells in this two-dimensional PCA "shape space"? We can examine this by plotting the distribution of shapes in the shape space:

celltool plot_distribution --x-column=2 --y-column=3 
--output-file="WT normalized positions.svg" "WT Model-normalized-positions.csv"
Aligned/*.contour

This causes shape model to read the entries of the file "WT Model-normalized-positions.csv". It reads the contour names from the first column, and then plots the shape of each contour (read from the contour files) at the (x,y) position specified by the second and third columns in that file. Thus, we can clearly see how cells with different positions on the shape modes have different shapes. Try the command again, but with the non-normalized position file: this shows the shapes more or less as they are arranged in the original shape space. Here, it is clear that the amount of variability along the first shape mode dwarfs that along the second -- a fact which is obscured by measuring both in terms of standard deviations, as in the initial plot. (However, that plot is easier to read...) If you've opened the plot in Illustrator, notice that the different plot elements are conveniently gropued and organized into layers. You can navigate through the layers using the "Layers" palette, and if you drill down far enough, you will see that each shape is a named object, so that you can trace it back to the contour (and thus image) file it came from.

Also, note that the celltool plot_distribution command can read any csv file -- as long as there's a column that contains the contour names, it can plot them; thus you can transform the data in Excel or whatnot and plot that (or other data produced by other means such as the celltool measure_contours command, about which see below).

Now, let's align the remaining contours to the mean wild-type shape. First, let's just look at that mean shape though... To do this, we can tell celltool plot_contours (not celltool plot_model!) to plot the shape model contour. The command will just read the mean shape from the model file and plot that. (In the following example, we'll also turn on some other neat features of the command):

celltool plot_contours --color-by=points --label-points
--output-file="WT Mean.svg" "WT Model.contour"

Looking at this, we see that perhaps this mean shape isn't exactly oriented as we'd like: the first point is somewhere near the middle of the contour (maybe it would be easier for subsequent measurements if it were at a pole?), and the contour is oriented horizontally (maybe we find it more aesthetically pleasing if it were vertical?). We can modify the contour file with the celltool modify_contours command, which can rotate and set the scale factor and units, as well as reorder the points of one or more contours:

celltool modify_contours --rotate=90 --first-point=71 "WT Model.contour"
            

When a destination directory is not specified, celltool just writes the resulting contours out into the current directory, as described in the help. This will cause the "WT Model.contour" file to get overwritten with the new one... Anyhow, we can now plot the new mean contours (just as above) to see the results. Now, let's align all of the contours to this mean shape (we need to re-align the wild-type contours to the modified mean as well):

celltool align_contours --reference="WT Model.contour" --destination=Aligned Contours/*.contour
            

Now, we'd like to look at the position of each different population of cells in the shape space to see if the perturbations did indeed modify the cell shapes. But first, a question arises: we have made a "wild-type" shape model. Should we measure all the shapes against this wild-type model, or should we make a new model that encompasses the variability in the perturbed populations as well? This will depend on the context, of course, but either approach is in principle reasonable. Note however that shape changes not captured by the modes in a model will just "disappear" -- so two populations with very different shapes could occupy the same position in the shape space described by a particular shape model, provided that the shape model doesn't have an appropriate mode to capture the variability between the populations. So, in this case, let's make a model from all of the contours:

celltool shape_model --variance-explained=0.95 --output-prefix=AllModel --no-data Aligned/*.contour
            

Here, we make a shape model from all of the contours, and also tell celltool not to write out the csv data file. (Why? Well, it happens that celltool plot_distribution can read in multiple csv files and color the contours from each different file differently, so we can easily differentiate between the treatments. We could take the data file generated by the celltool shape_model command and manually copy-paste it into several files (or transform via grep, etc); alternately, there are options in celltool plot_distribution to treat certain ranges of row numbers as separate groups. However, the approach shown below seems simplest.)

We could plot the model as above (please do so!) and so forth. In this case, the model from all the cells is substantially similar to that from the wild-type cells, but it was good to check.

Now let's get to new ground: measuring contour parameters. This is done with celltool measure_contours, which is an extremely flexible and thus complex tool. Basically, it takes as input a set of contours and a set of measurements to make, and outputs csv files of the measurements. Some measurements are easy to specify with a single option flag (like --area to cause the contour areas to be measured), but others are more complex and require further "sub options". So, for example, to tell celltool to measure the positions of a set of contours on a particular shape model, we need to specify the shape model file, the modes to measure, and so forth. When we're done specifying "sub options", we need to tell celltool that by putting a '-' on the command line. Here's an example:

celltool measure_contours --output-file="wild-type.csv" --area --centroid
--shape-modes AllModel.contour 1 2 - Aligned/WildType*.contour

This creates a csv file with six columns: the contour names, their areas (in square microns), their x and y centroids in the original images (in pixels), and their positions along modes 1 and 2 of the AllModel.contour shape model. Note the '-' in the command-line above to tell the program that we're done specifying sub-options for the 'shape-modes' measurement, and that subsequent entries are other measurements to make or (as above) contours to measure.

There are many other measurements available; consult celltool measure_contours --help for more information. In particular, some measurements are defined in terms of the point positions along the line. You can, for example, measure the path length along the inner curved side of the bacteria with --path-length --begin=1 --end=45 - (again, remember the '-' to denote that we're done with options specific for the --path-length measurement). To measure the entire perimeter, just omit the (optional --begin and --end options and use --path-length -. Other measurements can be made of image intensity in (e.g. fluorescence) micrographs; with these you need to list the relevant images as part of the "sub options".

Let's now measure the rest of the bacterial conditions:

celltool measure_contours --output-file="double.csv" --area --centroid
--shape-modes AllModel.contour 1 2 - Aligned/Double*.contourz
celltool measure_contours --output-file="FtsZ-.csv" --area --centroid
--shape-modes AllModel.contour 1 2 - Aligned/FtsZ-*.contour
celltool measure_contours --output-file="MreB-.csv" --area --centroid
--shape-modes AllModel.contour 1 2 - Aligned/MreB-*.contour

And now plot the results:

celltool plot_distribution --x-column=5 --y-column=6
--output-file="Strain Positions.svg" wild-type.csv double.csv FtsZ-.csv MreB-.csv Aligned/*.contour

As we can see, the different populations live in overlapping but relatively distinct regions of the shape-space. We can also look at the marginal distribution along just one axis:

celltool plot_distribution --x-column=Area
--output-file="Areas.svg" wild-type.csv double.csv FtsZ-.csv MreB-.csv

Here, we specify only an x-column (and exercise the functionality that allows us to type the name of the column header instead of its number), and we don't need to specify any contour files, because we'll be looking at the marginal distribution of areas (i.e. a "smoothed histogram"; actually a kernel density estimate of the marginal).

This concludes the basic tutorial. There are other celltool commands that haven't been covered, including add_landmarks, which can be used to add "landmark points" (e.g. the rear of a crawling cell, or a bacterial pole) from images. These landmarks facilitate alignment of shapes which are difficult to align properly based on shape alone. There is the extract_images command, which can "snip" out image regions corresponding to various contours, and also the image-processing measurements available in measure_contours, including measures of integrated image intensity within an entire shape, and also within certain geometric regions defined by the shape's outline. (For example, you could measure the total intensity within 1 micron inward of contour points 10 to 30, or you could measure the profile of fluorescence along or inward from the leading edge of a crawling cell.)