Overview of the pipeline utility process
----------------------------------------
The pipeline process will take scanned images that are digitized into
CZI files and make them available in Neuroglancer. The process involves
the following steps:
Extracting TIFFs from CZIs
~~~~~~~~~~~~~~~~~~~~~~~~~~
1. The user enters the initial information into the database. The
entire process depends on this initial step and will not proceed
without this information. Throughout the process, the database is
checked for variables so it is vital for the correct information to
be placed in these tables:
1. `animal `__
2. `scan_run `__
3. `histology `__
2. CZI files are placed on a network file system (NFS) which is named
birdstore and is mounted at /net/birdstore on our 3 workstations:
1. ratto
2. basalis
3. muralis
3. The location of the CZI files is:
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/czi*
4. CZI files are large compressed files containing around 20 images and
a large amount of metadata that describes the scanner and the
scanned images. The images and metadata contained in each CZI file
are extracted with the
`aicspylibczi `__
library
5. The extracted tiff files are them stored in
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/tif*
6. The *showinf* tool takes the metadata and stores it into the
following tables in the database:
1. `slide `__
2. `slide_czi_to_tif `__
3. `scan_run `__
Manual quality control of sections
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Once the TIF files are placed in the tif directory, the user can
perform quality control on the
`slides `__
in the database. There are around 116 to 166 slides (CZI files) per
mouse with each slide usually containing 4 scenes(pictures). During
QA on the slides and scenes, the user will replace bad scenes with
adjacent good scenes. Entire slides can also be removed by marking
the slide appropriately in the portal. Once QA is finished, the user
will continue with the pipeline process.
2. The user then creates the sections and web images. The correct order
of slides/scenes is fetched from the database and the files are
copied to:
1. */net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/full*
2. */net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/thumbnail*
3. We then normalize the TIF images intensities to a visiable range and
store them in
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/normalized*
Masking
~~~~~~~
1. Masks are then created from the thumbnail images and are placed in:
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/masks/thumbnail_colored/*
2. The masks are created with Pytorch and torchvision and the process
is very similar to:
https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
3. The users can then add to the mask with a *white* paintbrush or
remove mask sections with a *black* paintbrush.
4. When the user is done, the pipeline is run again and the edited
masks are extracted.
5. Masks in
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/masks/thumbnail_masked/*
are then used to create clean images in
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/thumbnail_cleaned/*
Retraing the masking objection detection model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. After the masks have been checked for quality, we can take the good
masks and retrain the model to make the entire process better. To do
that, follow these steps:
1. Use muralis as it has two good GPUs and make sure /net/birdstore
is accessible.
2. Use this virtualenv
``source /usr/local/share/masking/bin/activate`` It has a newer
working version of pytorch and torchvision.
3. Take the good final masks from
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/masks/thumbnail_masked/*
and copy them to
*/net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/thumbnail_masked*
The file names must be named like: *DKXX.249.tif*. The animal
name must be prepended to the actual file name.
4. Take the normalized images from
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CH1/normalized*
and copy them to
*/net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/normalized*
Again, prepend the animal name to the file.
5. There must be an equal amount of files in both:
1. /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/normalized
2. /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/thumbnail_masked
6. You can now start the training process.
1. Back up the existing model:
``mv -vf /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/mask.model.pth /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/mask.model.pth.bak``
2. In the base part of this repo activate the virtualenv and do:
``python src/masking/scripts/mask_trainer.py --runmodel false``
3. That will not run the process but will tell you how many
images you are working with and if you are using a CPU or GPU.
You really need to use a GPU on this process otherwise it will
take days to run.
4. After you are sure you have a viable GPU, do:
``python src/masking/scripts/mask_trainer.py --runmodel true --epochs 30``
That will run the model for 30 epochs. 30 is probably
overkill, 20 might do it, I would not go under 15.
5. After that runs, a new model will be stored in:
*/net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/mask.model.pth*
6. You can view a plot of the loss over epochs graph by looking at: /net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/loss_plot.png
7. The new model will now be ready to use. You can test it out by
removing the files in:
*/net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/thumbnail_colored*
and rerunning the create_pipeline.py script:
``python src/pipeline/scripts/create_pipeline.py --animal DKXX --step 1`` Go to
the
*/net/birdstore/Active_Atlas_Data/data_root/brains_info/masks/thumbnail_colored*
and verify the masks look good. They should as those images have
already been used in the training process. A better test would be
to use them on new images.
Section to section alignment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. The cleaned images are then aligned to each other using *Elastix*
which is built into the SimpleITK library. Each image is aligned to
the image before it in section order. This data is stored in the
elastix_transformation table in the database. For each image, the
rotation, xshift, and yshift data is stored. This is then used in
the alignment process to create a stack of section to section
aligned images. These images are then stored in:
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/preps/CHX/thumbnail_aligned/*
Creating Neuroglancer data
~~~~~~~~~~~~~~~~~~~~~~~~~~
1. The aligned images are now ready to be processed into Neuroglancer’s
default image type:
`precomputed `__
2. There are two steps to creating the precomputed format:
1. Create the intial chunk size of (64,64,1). Neuroglancer serves
data from the webserver in chunks. The initial chunk only has a z
length of 1. This is necessary for the initial creation. However,
this chunk size results in too many files and needs to be
*transfered* by the next step in the process which creates a
better chunk size and results in the *pyramid* scheme that is
best for viewing in a web browser. This data is stored in
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/neuroglancer_data/CX_rechunkme*
2. The 2nd phase in the precomputed process creates a set of optimum
chunks from the directory created in the previous step and places
the new pyramid files in
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DKXX/neuroglancer_data/CX*
This data is now ready to be served by the Apache web server.
Note that all the chunks (and there can be millions of files) are
compressed with *gzip* and so the Apache web server must be
configured to serve compressed files. This is done in one of the
configuration files under the Apache configuration directory on
the web server.
3. All data in
*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/* is
available to Neuroglancer. When the user opens up Neuroglancer and
enters a URL path in the precomputed field, the URL will actually be
pointing to the data on birdstore. For example, typing this URL in
Neuroglancer:
https://activebrainatlas.ucsd.edu/data/DK39/neuroglancer_data/C1
will be pointing to
\*/net/birdstore/Active_Atlas_Data/data_root/pipeline_data/DK39/neuroglancer_data/C1
on birdstore.