Overview of the pipeline utility process

The pipeline process will take scanned images that are digitized into CZI files and make them available in Neuroglancer. The process involves the following steps:

Extracting TIFFs from CZIs

The user enters the initial information into the database. The entire process depends on this initial step and will not proceed without this information. Throughout the process, the database is checked for variables so it is vital for the correct information to be placed in these tables:
CZI files are placed on a network file system (NFS) which is named birdstore and is mounted at /net/birdstore on our 3 workstations:
1. ratto
2. basalis
3. muralis
The location of the CZI files is: /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/czi
CZI files are large compressed files containing around 20 images and a large amount of metadata that describes the scanner and the scanned images. The images and metadata contained in each CZI file are extracted with the aicspylibczi library
The extracted tiff files are them stored in /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/tif
The showinf tool takes the metadata and stores it into the following tables in the database:

Manual quality control of sections

Once the TIF files are placed in the tif directory, the user can perform quality control on the slides in the database. There are around 116 to 166 slides (CZI files) per mouse with each slide usually containing 4 scenes(pictures). During QA on the slides and scenes, the user will replace bad scenes with adjacent good scenes. Entire slides can also be removed by marking the slide appropriately in the portal. Once QA is finished, the user will continue with the pipeline process.
The user then creates the sections and web images. The correct order of slides/scenes is fetched from the database and the files are copied to:
1. /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/full
2. /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/thumbnail
We then normalize the TIF images intensities to a visiable range and store them in /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/normalized

Masking

Masks are then created from the thumbnail images and are placed in: /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/masks/thumbnail_colored/
The masks are created with Pytorch and torchvision and the process is very similar to: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
The users can then add to the mask with a white paintbrush or remove mask sections with a black paintbrush.
When the user is done, the pipeline is run again and the edited masks are extracted.
Masks in /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/masks/thumbnail_masked/ are then used to create clean images in /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/thumbnail_cleaned/

Retraing the masking objection detection model

After the masks have been checked for quality, we can take the good masks and retrain the model to make the entire process better. To do that, follow these steps:
1. Use muralis as it has two good GPUs and make sure /net/birdstore is accessible.
2. Use this virtualenv source /usr/local/share/masking/bin/activate It has a newer working version of pytorch and torchvision.
3. Take the good final masks from /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/masks/thumbnail_masked/ and copy them to /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/thumbnail_masked The file names must be named like: DKXX.249.tif. The animal name must be prepended to the actual file name.
4. Take the normalized images from /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CH1/normalized and copy them to /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/normalized Again, prepend the animal name to the file.
5. There must be an equal amount of files in both:
  1. /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/normalized
  2. /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/thumbnail_masked
6. You can now start the training process.
  1. Back up the existing model: mv -vf /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/mask.model.pth /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/mask.model.pth.bak
  2. In the base part of this repo activate the virtualenv and do: python src/masking/scripts/mask_trainer.py --runmodel false
  3. That will not run the process but will tell you how many images you are working with and if you are using a CPU or GPU. You really need to use a GPU on this process otherwise it will take days to run.
  4. After you are sure you have a viable GPU, do: python src/masking/scripts/mask_trainer.py --runmodel true --epochs 30 That will run the model for 30 epochs. 30 is probably overkill, 20 might do it, I would not go under 15.
  5. After that runs, a new model will be stored in: /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/mask.model.pth
  6. You can view a plot of the loss over epochs graph by looking at: /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/loss_plot.png
7. The new model will now be ready to use. You can test it out by removing the files in: /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/thumbnail_colored and rerunning the create_pipeline.py script: python src/pipeline/scripts/create_pipeline.py --animal DKXX --step 1 Go to the /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/thumbnail_colored and verify the masks look good. They should as those images have already been used in the training process. A better test would be to use them on new images.

Section to section alignment

The cleaned images are then aligned to each other using Elastix which is built into the SimpleITK library. Each image is aligned to the image before it in section order. This data is stored in the elastix_transformation table in the database. For each image, the rotation, xshift, and yshift data is stored. This is then used in the alignment process to create a stack of section to section aligned images. These images are then stored in: /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/thumbnail_aligned/

Creating Neuroglancer data

The aligned images are now ready to be processed into Neuroglancer’s default image type: precomputed
There are two steps to creating the precomputed format:
1. Create the intial chunk size of (64,64,1). Neuroglancer serves data from the webserver in chunks. The initial chunk only has a z length of 1. This is necessary for the initial creation. However, this chunk size results in too many files and needs to be transfered by the next step in the process which creates a better chunk size and results in the pyramid scheme that is best for viewing in a web browser. This data is stored in /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/neuroglancer_data/CX_rechunkme
2. The 2nd phase in the precomputed process creates a set of optimum chunks from the directory created in the previous step and places the new pyramid files in /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/neuroglancer_data/CX This data is now ready to be served by the Apache web server. Note that all the chunks (and there can be millions of files) are compressed with gzip and so the Apache web server must be configured to serve compressed files. This is done in one of the configuration files under the Apache configuration directory on the web server.
All data in /net/birdstore/Active_Atlas_Data/data_root/pipeline_data/ is available to Neuroglancer. When the user opens up Neuroglancer and enters a URL path in the precomputed field, the URL will actually be pointing to the data on birdstore. For example, typing this URL in Neuroglancer: https://activebrainatlas.ucsd.edu/data/DK39/neuroglancer_data/C1 will be pointing to */net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DK39/neuroglancer_data/C1 on birdstore.