Janelia Research Campus
3 files

Mouse data for whole-embryo lineage reconstruction with linajea

Download all (2.1 GB)
posted on 2024-05-09, 17:29 authored by Caroline Malin-MayorCaroline Malin-Mayor, Peter HirschPeter Hirsch, Léo Guignard, Katie McDole, Yinan Wan, William C. Lemon, Dagmar Kainmueller, Philipp J. Keller, Stephan PreibischStephan Preibisch, Jan Funke

This article enables access to the mouse dataset (140521) for "Automated reconstruction of whole-embryo cell lineages by learning from sparse annotations" (Malin-Mayor et al. 2023, DOI: https://doi.org/10.1038/s41587-022-01427-7).

Here we provide the ground truth tracks used to train the deep learning model, the trained networks, and the predicted tracks. Additionally, we provide information on how to access the image data, although it is not uploaded here due to size. Related artifacts include the source code for experiments and methods.

Image Data

The image dataset in n5/zarr format (as used in Malin-Mayor et al. 2023) can be accessed at the following Dropbox link: https://www.dropbox.com/scl/fi/2mt7jxmtl80s3zf2byfyr/140521_mouse.tar.gz?rlkey=n5r311whn8ky4gdabybjdekcc&dl=0. This image dataset was originally published in "In Toto Imaging and Reconstruction of Post-Implantation Mouse Development at the Single-Cell Level" ( McDole et al. 2018, DOI: https://doi.org/10.1016/j.cell.2018.09.031), and can also be accessed in the Image Dataset Repository in .klb format along with associated metadata at https://idr.openmicroscopy.org/webclient/?show=project-502.

Ground Truth Tracks

Inside gt_tracks.zip there are a number of files containing different subsets of tracks. Each has the following columns separated by tabs: time, z, y, x, cell_id, parent_id, track_id.

tracks.txt is the main file containg manual annotations of individual cells from start to end of video used to train the model. These tracks are sparse, but each cell included in the tracks.txt had its whole lineage traced as completely as possible from start to end of the video.

division_tracks.txt is a different set of manually annotated tracks, where each track is around 5 frames long and centers around a division. daughter_cells.txt is a subset of division_tracks.txt containing only the cells directly after a division event, and was generated for convenient and efficient training of models where divisions are oversampled.

full_frame_divisions.txt is a set of manually annotated division points (points right before the cell divides) that are as complete as possible for target time points 120, 240, and 360 and the adjacent time frames, which was used for evaluation and not model training.

Trained Models

trained_networks.zip includes all networks trained on the mouse dataset. The model we suggest using for best performance is described in 140521_mouse_simple_train_all_config.json and the weights are included in train_net_checkpoint_400000.*. This model was trained and validated on all available ground truth data, and as such is NOT the same as the models used to report results in the paper.

supp_figure_2 includes the configs and models used to report results in the Supplemental Figure 2a of the paper, and Figure 2a of the main text. We separated the data into train/validation/test splits on "early" (times 50-100), "middle" (times 225-275) and "late" (times 400-450). Each model has two time splits held out for validation and testing, and therefore was trained on the remaining split as well as all time frames not in one of the splits. For the mouse, this resulted in 3 trained networks for the main ("setup11_simple") architecture.

supp_figure_6b contains the configs and trained models presented in the ablation study in the Supplemental Figure 6B.

Predicted Tracks

predicted_tracks.zip contains both the TGMM baseline results and the results for the linajea method.

tgmm/140521_shifted_TGMM.xml contains the TGMM results provided to us by the authors of the TGMM method.

The linajea results are organized similarly to the trained models. mouse_all_results_071621.txt contains the tracks predicted by the model trained on all ground truth tracks (140521_mouse_simple_train_all). Again, these are NOT the tracks evaluated in the paper, but they are likely to be the most correct since they were trained on the most data.

supp_figure_2 contains the tracks used in the main Figure 2a and in the Supplemental Figure 2a. supp_figure_6b contains the tracks used in Supplemental Figure 6B (ablation study).


Usage metrics

    Janelia Research Campus


    Ref. manager