Skip to content

Latest commit

 

History

History
86 lines (79 loc) · 19.1 KB

File metadata and controls

86 lines (79 loc) · 19.1 KB

Changelog

This is a general and non-exhaustive changelog of the mayor changes at certain milestones of the development of the repository, marked by the version tags. For some future directions, visit the projects page.

v3.0 (current)

  • Abstract common dataset base into BaseDataset and move functions share by all dataloaders to base_dataset.
  • Finished converting main dataloaders to use composable functions based on augmennt, with focus on reusable functions and performance improvements.
  • Merged all previous LRHR*_dataset loaders into a single aligned_dataset, modified LR_dataset into single_dataset for single image dataset loading and added the unaligned_dataset for unpaired images training.
  • Added an independent weighted sampler for multiple distributions with concatenated datasets: MultiSampler, used with the concat_unaligned dataset mode.
  • Added support to use PIL as the images backend (with Torchvision transforms), in parallel to the existing OpenCV2 backend (with augmennt composable functions. PIL options and functionality is more limited, but can be extended as needed, CV2 remains the main backend to use due to better performance and optimized functions available. Minor notes: added a pre_crop option (option: pre_crop) to accelerate steps in the pipeline that perform resizing before cropping (like "random downscale HR/B" and when generating LR/A iamge on the fly), and remade hr_rot function for accurate free-range random rotations (option: use_hrrot).
  • Enabled additional augmentations from augmennt: cv2 k-means based color quantization (km_quantize), option to compress images with webp or jpeg (webp, jpeg), expose bilateral filter transform, extended new ApplyKernel function for convolving with arbitrary kernels (besides estimaged images kernels: KernelGAN) to add simple and complex motion blur transforms with randomly generated motion kernels (motion, complexmotion), CLAHE augmentation (clahe), superpixels (superpixels), raw camera noise via unprocess/process pipeline (camera), new anisotropic and isotropic gaussian blur filters (iso, aniso) and sinc (sinc) filter. Also updated gaussian noise, speckle noise, and chromatic aberration augmentations to new versions.
  • Added new interpolation methods kernels options for use with the accurate and antialiased Matlab-like imresize function, intended to train models with a wider variety of kernels and some that are specific to some use cases. All the methods available are: blackman5, blackman4, blackman3, blackman2, sinc5, sinc4, sinc3, sinc2, gaussian, hamming, hanning, catrom, bell, hermite, mitchell, cubic (bicubic), lanczos5, lanczos4, lanczos3, lanczos2, box, linear (bilinear). The OpenCV methods now have the prefix cv2_ and realistic kernels (ie. extracted with KernelGAN) can be combined during training. Note: Using PIL as the image backend currently only allows to use the PIL bicubic method to resize.
  • Extended OTF augmentations pipeline with second and final blur, additional compression steps, in-pipeline resizing and random shuffling flag, among others. The original pre-pipeline resizing strategy can still be used with resize_strat: 'pre' and in-pipeline with resize_strat: 'in'.
  • Added special scaling cases: down_up to randomize scales when resizing in-pipeline and nearest_aligned that prevents the misalignment of 0.5 × (s − 1) pixels when using regular nearest neighbor downscaling. Additionally, when used in-pipeline, random up/down/keep probabilities and scale ranges can be defined, with adaptive scale in case images are already in the final target scale.
  • Added option to select what kernel scale to use for realistic option.
  • Updated the imresize function with a new version adapted from ResizeRight to solve the incorrectness issues from standard frameworks (OpenCV, PIL, Torch, TensorFlow, others).
  • Added the Adaptive Target Generator (ATG) and strategy from AdaTarget to spatially match the network outputs to target images. More information in AdaTarget. Option: use_atg.
  • Added option to use Pixel-Unshuffle to fully replicate the Cutblur batch augmentation strategy and as an option to convert the input images to lower resolution and extended depth, reducing the resources required to train them. Option: use_unshuffle.
  • Included a flag to select the image pre-processing option that is more adequate for the model training task (necessary for image to image translation options). Besides the default crop to crop patches for super-resolution, denoising and deblurring, now there are also resize, scale_width, scale_height, scale_shortside and fixed for image to image translation and other cases. Can also be combined using _and_ as a connector (for example: resize_and_crop). Can also use center_crop to first extract a center crop. (option: preprocess, must coordinate with load_size, center_crop_size, aspect_ratio).
  • Better logic to automatically save and load an arbitrary number of networks during training by defining them in the model_names object in the initialization of model files.
  • Added AdamP, SGDP, MADGRAD and Ranger optimizers as alternative to the ones included in PyTorch (Adam, SGD, RMSprop).
  • Added FlatCosineDecay scheduler to complement the Ranger optimizer strategy (similar to linear decay after fixed lr rate, but with a cosine function).
  • Fully integrated the ReduceLROnPlateau scheduler with the metrics builded, so it can be used by selecting the metric to use for evaluation from the ones computed during validation step. Note: nll is an optional metric that can be used when training SRFlow (select min as the plateau_mode).
  • Added multiscale pixel loss and experimental color, average, fft, overflow and range losses.
  • Added pix2pix and CycleGAN networks and training strategies. Besides the options available in the original code (image pool for discriminator, conditional GAN training, etc) can also use the discriminators, losses and other functions available here, such as AMP, frequency separation, differential augmentations and others. Note: the original AB paired datasets from pix2pix can can be used and will be automatically split into the A input domain and the B target domain, by using the outputs: AB and dataroot_AB option and path in the options file.
  • Added the white-box cartoonization (WBC) training strategy for unpaired photo to cartoon images. Same additions as pix2pix/CycleGAN, plus fine-grained losses configuration for each image representation.
  • Use of configuration presets and default values injection, instead of editing full configuration files. Augmentations presets are defined in blur, resize and noise files, with a base default configuration and additional overlays (can be overriden with the main options file). Networks have pre-flight configuration check. Currently optimizers and losses manage defaults on their own, move to defaults.py TBD.
  • Added presets for Real-SR, BSRGAN and Real-ESRGAN (rESRGAN) training strategies.
  • Added option to select what KernelGAN extratcted kernel scale to use during training (realk_scale).
  • Updated the (VGG) feature extractor network, by merging the original VGG extractor and the one used by contextual loss. Now has new options like changing the first convolution padding to reduce edge artifacts, remove pooling layers, change MaxPool2d strides, and option to set the network to trainable.
  • Upgraded the FeatureLoss function, enabled configuration of independent random paired rotations and flips, and included the style loss with gram matrix in the same function to reuse the feature maps (Note: currently Style loss will produce NaN if using AMP). The feature extractor and FeatureLoss configuration can be controlled from perceptual_opt in the options file.
  • Added Frobenius norm loss, mainly for feature loss, but can be used as a regular content/pixel loss (fro).
  • Updated the TV regularization for better logic and configurable output normalization to match different implementations. (configuration via options file TBD)
  • Added option to use standard GAN formulation (besides relativistic GAN formulation), configurable with gan_opt in the options file.
  • Added a unet discriminator alternative, with optional spectral normalization option.
  • Added features extraction/matching for patchgan and multiscale patchgan discriminators, to be used as a feature loss, equivalent to discriminator_vgg_128_fea and discriminator_vgg_fea. Note that there's an additional option required to enable, under which_model_D, set get_feats: true.
  • Renamed project to traiNNer for simpler understandability of the repository function, similar to iNNfer for inference and augmeNNt for the augmentations.
  • Added option to use gradient clipping. Clipping by both value (clip_grad_value_) and norm (clip_grad_norm_) is supported. The clipping value can be provided or in the norm case, auto can be used to automatically calculate the clipping value. Options: grad_clip and grad_clip_value.

v2.0:

  • Add automatic loading models of either the original ESRGAN architecture or the modified one.
  • Large rewrite of code was made to reduce code redundancy and duplicates, reorganize the code and make it more modular.
  • The filters and image manipulations used by the different functions (HFEN, SSIM/MS-SSIM, SPL, TV/DTV, etc) are now consolidated in filters.py and colors.py.
  • Reusable loss builder/factory to reduce the changes needed when using a new model and adding new losses only once for all models.
  • Metrics builder to include only the selected ones during validation.
  • Integrated Automatic Mixed Precision (AMP). (Code updated to work with Pytorch >= 1.6.0 and 1.3.0). Option use_amp.
  • Contextual Loss (CX, CX). Option: cx_type.
  • Differential Augmentations for efficient gan training (Paper). Option: diffaug.
  • Batch augmentations (based on Cutblur). Option: mixup.
  • ESRGAN+ improvements to the ESRGAN network (ESRGAN+). Options: gaussian and plus.
  • Adapted frequency filtering per loss function (Reference). In general, content losses receive the low-frequency images, feature losses non-filtered images and the discriminator high-frequency images. Option: fs.
  • Added on the fly use of realistic image kernels extracted with KernelGAN (Paper and injection of noise extracted from real images patches (Reference).
  • Enabled option to use the feature maps from the VGG-like discriminator in training for feature similarity (Reference). Option: discriminator_vgg_128_fea.
  • PatchGAN option for the discriminator (Reference). Option: patchgan.
  • Multiscale PatchGAN option for the discriminator (Reference). Option: multiscale.
  • Added a modified Pixel Attention Network for Efficient Image Super-Resolution (PAN), which includes a self-attention layer in the residual path, among other changes. A basic pretrained model for 4x scale can be found here.
  • Stochastic Weight Averaging (SWA, Pytorch) added as an option. Currently the change only applies to the generator network, changing the original learning rate scheduler to the SWA scheduler after a defined number of iterations have passed (the original paper refers to the later 25% part of training). The resulting SWA model can be converted to a regular model after training using the scripts/swa2normal.py script. Option use_swa and configure the swa scheduler.
  • Migrate all main on the fly augmnetations to a new repository (augmennt) and initiate change to use openCV-based composable transformation for augmentations with a new dataloader.
  • Added the basic idea behind "Freeze Discriminator: A Simple Baseline for Fine-tuning GANs" (FreezeD) to accelerate training with transfer learning. It is possible to use a pretrained discriminator model and freeze the initial (bottom) X number of layers. Option: freeze_loc, enabled for any of the VGG-like discriminators or patchgan (multiscale patchgan not yet added).
  • Integrated the Consistency Enforcing Module (CEM) from Explorable Super Resolution (Paper, Web). Available both for use during inference, as well as during training (only using a default downsampling kernel ATM). Can be easily extended to use estimaged Kernels from the images for downscaling using KernelGAN from DLIP. More information on CEM here.
  • Added the training and testing codes for Super-Resolution using Normalizing Flow in PyTorch (SRFlow models (including the GLOW reference code). A starter pretrained model can be found here, which used a model based on the original ESRGAN architecture for the RRDB module (it is necessary to use it to later be able to test model interpolations). Otherwise, the original SRFlow model used the modified ESRGAN pretrained model that can also be used.
  • Other changes: added graceful interruption of training to continue from where it was interrupted, virtual batch option, strict model loading flag, support for using YAML or JSON options files, color transfer script (color_transfer.py) with multiple algorithms to transfer image statistics (colors) from a reference image to another, integrated the forward_chop function into the SR model to crop images into patches before upscaling for inference in VRAM constrained systems (use option test_mode: chop), general fixes and code refactoring.

v2.0 video architectures (still WIP):

  • Video network for optical flow and video super-resolution (SOFVSR). Pretrained model using 3 frames, trained on a subset of REDS dataset here.
  • Added option to use different image upscaling networks with the HR optical flow estimation for video (Pretrained using 3 frames and default ESRGAN as SR network here).
  • Initial integration of RIFE (Paper) architecture for Video Frame Interpolation (Converted trained model from three pickle files into a single pth model here).
  • Video ESRGAN (EVSRGAN) and SR3D networks using 3D convolution for video super-resolution, inspired on "3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks" (Paper). EVSRGAN Pretrained using 3 frames and default arch options here.
  • Real-time Deep Video Deinterlacing (Paper) training and testing codes implemented. Pretrained DVD models can be found here.

v1.0:

  • On the fly (OTF) augmentations pipeline (gaussian/s&p/speckle/Poisson/dither noise, gaussian/box/average blur, JPEG compression, quantization based on self-organizing maps, others), with fully randomized application.
  • Total Variation (TV) regularization options. Useful for denoising tasks.
  • High-frequency error norm (HFEN) loss. Useful to keep high frequency information.
  • SSIM and MS-SSIM image structure loss functions.
  • Alternative content/pixel based loss functions: Charbonnier, Elastic, RelativeL1, L1CosineSim.
  • Experimental Spatial Profile Loss (SPL).
  • Partial Convolution based Padding (PartialConv2D). It should help prevent edge padding issues. Zero padding is the default and typically has best performance, PartialConv2D has better performance and converges faster for segmentation and classification (https://arxiv.org/pdf/1811.11718.pdf). Code has been added, but the switch makes pretained models using Conv2D incompatible. Test for inpainting and denoising.
  • Initial implementation of the PPON training, based on the original published paper. Including Multiscale L1 loss in phase 2.
  • Update ESRGAN training model (SRRAGAN, now sr for super-resolution and restoration)
  • Implement extraction of features from the discriminator to be used as a feature loss, as an alternative to classification networks like VGG (similar to SRPGAN).

Pre-v1.0