Mouse data for whole-embryo lineage reconstruction with linajea
This article enables access to the mouse dataset (140521) for "Automated reconstruction of whole-embryo cell lineages by learning from sparse annotations" (Malin-Mayor et al. 2023, DOI: https://doi.org/10.1038/s41587-022-01427-7).
Here we provide the ground truth tracks used to train the deep learning model, the trained networks, and the predicted tracks. Additionally, we provide information on how to access the image data, although it is not uploaded here due to size. Related artifacts include the source code for experiments and methods.
Image Data
The image dataset in n5/zarr format (as used in Malin-Mayor et al. 2023) can be accessed at the following Dropbox link: https://www.dropbox.com/scl/fi/2mt7jxmtl80s3zf2byfyr/140521_mouse.tar.gz?rlkey=n5r311whn8ky4gdabybjdekcc&dl=0. This image dataset was originally published in "In Toto Imaging and Reconstruction of Post-Implantation Mouse Development at the Single-Cell Level" ( McDole et al. 2018, DOI: https://doi.org/10.1016/j.cell.2018.09.031), and can also be accessed in the Image Dataset Repository in .klb format along with associated metadata at https://idr.openmicroscopy.org/webclient/?show=project-502.
Ground Truth Tracks
Inside gt_tracks.zip
there are a number of files containing different subsets of tracks. Each has the following columns separated by tabs: time, z, y, x, cell_id, parent_id, track_id
.
tracks.txt
is the main file containg manual annotations of individual cells from start to end of video used to train the model. These tracks are sparse, but each cell included in the tracks.txt
had its whole lineage traced as completely as possible from start to end of the video.
division_tracks.txt
is a different set of manually annotated tracks, where each track is around 5 frames long and centers around a division. daughter_cells.txt
is a subset of division_tracks.txt
containing only the cells directly after a division event, and was generated for convenient and efficient training of models where divisions are oversampled.
full_frame_divisions.txt
is a set of manually annotated division points (points right before the cell divides) that are as complete as possible for target time points 120, 240, and 360 and the adjacent time frames, which was used for evaluation and not model training.
Trained Models
trained_networks.zip
includes all networks trained on the mouse dataset. The model we suggest using for best performance is described in 140521_mouse_simple_train_all_config.json
and the weights are included in train_net_checkpoint_400000.*
. This model was trained and validated on all available ground truth data, and as such is NOT the same as the models used to report results in the paper.
supp_figure_2
includes the configs and models used to report results in the Supplemental Figure 2a of the paper, and Figure 2a of the main text. We separated the data into train/validation/test splits on "early" (times 50-100), "middle" (times 225-275) and "late" (times 400-450). Each model has two time splits held out for validation and testing, and therefore was trained on the remaining split as well as all time frames not in one of the splits. For the mouse, this resulted in 3 trained networks for the main ("setup11_simple") architecture.
supp_figure_6b
contains the configs and trained models presented in the ablation study in the Supplemental Figure 6B.
Predicted Tracks
predicted_tracks.zip
contains both the TGMM baseline results and the results for the linajea method.
tgmm/140521_shifted_TGMM.xml
contains the TGMM results provided to us by the authors of the TGMM method.
The linajea results are organized similarly to the trained models. mouse_all_results_071621.txt
contains the tracks predicted by the model trained on all ground truth tracks (140521_mouse_simple_train_all
). Again, these are NOT the tracks evaluated in the paper, but they are likely to be the most correct since they were trained on the most data.
supp_figure_2
contains the tracks used in the main Figure 2a and in the Supplemental Figure 2a. supp_figure_6b
contains the tracks used in Supplemental Figure 6B (ablation study).