Janelia Research Campus
Browse

Zebrafish data for whole-embryo lineage reconstruction with linajea

dataset
posted on 2024-06-24, 21:41 authored by Caroline Malin-MayorCaroline Malin-Mayor, Peter HirschPeter Hirsch, Léo Guignard, Katie McDole, Yinan Wan, William C. Lemon, Dagmar Kainmueller, Philipp J. Keller, Stephan PreibischStephan Preibisch, Jan Funke

This article provides access to the zebrafish data (160328) for "Automated reconstruction of whole-embryo cell lineages by learning from sparse annotations" (Malin-Mayor et al. 2023, DOI: https://doi.org/10.1038/s41587-022-01427-7).

Here we provide the ground truth tracks used to train the deep learning model, the trained networks, and the predicted tracks. Additionally, we provide information on how to access the image data, although it is not uploaded here due to size. Related artifacts include the source code for experiments and methods.

Image Data

The image dataset in n5/zarr format (as used in Malin-Mayor et al. 2023) can be accessed at the following Dropbox link: https://www.dropbox.com/scl/fi/1qeac8uwvctq9q1y451lg/160328_zebrafish.tar.gz?rlkey=whs5hyxtacdet0ypwyroigara&dl=0. This image dataset was originally published in "Single-cell reconstruction of emerging population activity in an entire developing circuit" ( Wan, Y. et al. 2019, DOI: https://doi.org/10.1016/j.cell.2019.08.039). The image dataset has two channels, corresponding to two camera views that have been registered but not merged. While originally these channels each were anisotropic with a voxel size ratio of 6:1, we have resampled them to be isotropic by downsampling the dimension with larger resolution by 2 and upsampling the two dimensions with smaller resolution by 3.

Ground Truth Tracks

Inside gt_tracks.zip there are a number of files containing different subsets of tracks. Each has the following columns separated by tabs: time, z, y, x, cell_id, parent_id, track_id.

tracks_side_1.txt and tracks_side_2.txt are the main files containg manual annotations of individual cells from start to end of video used to train the model. These tracks are sparse, but each cell included had its whole lineage traced as completely as possible from start to end of the video. These tracks were split based on which side of the center line of the embryo the cells were on: discarded.txt contains tracks that crossed the center line and were not used for training or testing.

full_frame_divisions.txt is a set of manually annotated division points (points right before the cell divides) that are as complete as possible for target time points 50, 100, 150, 200, 250, 300, and 350 and adjacent frames, and were used for evaluation and not model training.

Trained Models

trained_networks.zip includes both networks trained on the zebrafish dataset. The config files are in the zebrafish_config_files directory, and the other directories are named corresponding to the model_name in each config file and contain the trained model files. There is one model trained/validated on each side of the embryo, and evaluated on the other side, as described in Supplemental Note 1.

Predicted Tracks

predicted_tracks.zip contains both the TGMM baseline results and the results for the linajea method. Each has the following columns separated by tabs: time, z, y, x, cell_id, parent_id, track_id, [node_score, edge_score] (node and edge score only present for lineajea predictions).

The TGMM directory contains the TGMM results provided to us by the authors of the TGMM method. There is a separate result for each input channel.

The linajea results are organized similarly to the trained models, with one text file for each side of the embryo. zebrafish_side_1_tracks_071621.txt contains tracks generated by the model trained on side 2 and predicted on side 1, and zebrafish_side_2_tracks_071621.txt contains tracks generated by the model trained on side 1 and predicted on side 2. Predictions were masked via a rough rectangle to only be done on the test "side", to save computation.

History

Usage metrics

    Janelia Research Campus

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC