This is a general and non-exhaustive changelog of the mayor changes at certain milestones of the development of the repository, marked by the version tags. For some future directions, visit the projects page.
- Abstract common dataset base into BaseDataset and move functions share by all dataloaders to base_dataset.
- Finished converting main dataloaders to use composable functions based on augmennt, with focus on reusable functions and performance improvements.
- Merged all previous
LRHR*_datasetloaders into a singlealigned_dataset, modifiedLR_datasetintosingle_datasetfor single image dataset loading and added theunaligned_datasetfor unpaired images training. - Added an independent weighted sampler for multiple distributions with concatenated datasets:
MultiSampler, used with theconcat_unaligneddataset mode. - Added support to use
PILas the images backend (with Torchvision transforms), in parallel to the existingOpenCV2backend (with augmennt composable functions.PILoptions and functionality is more limited, but can be extended as needed,CV2remains the main backend to use due to better performance and optimized functions available. Minor notes: added apre_cropoption (option:pre_crop) to accelerate steps in the pipeline that perform resizing before cropping (like "random downscale HR/B" and when generating LR/A iamge on the fly), and remadehr_rotfunction for accurate free-range random rotations (option:use_hrrot). - Enabled additional augmentations from augmennt: cv2 k-means based color quantization (
km_quantize), option to compress images with webp or jpeg (webp,jpeg), expose bilateral filter transform, extended newApplyKernelfunction for convolving with arbitrary kernels (besides estimaged images kernels: KernelGAN) to add simple and complex motion blur transforms with randomly generated motion kernels (motion,complexmotion), CLAHE augmentation (clahe), superpixels (superpixels), raw camera noise via unprocess/process pipeline (camera), new anisotropic and isotropic gaussian blur filters (iso,aniso) and sinc (sinc) filter. Also updated gaussian noise, speckle noise, and chromatic aberration augmentations to new versions. - Added new interpolation methods kernels options for use with the accurate and antialiased Matlab-like
imresizefunction, intended to train models with a wider variety of kernels and some that are specific to some use cases. All the methods available are:blackman5,blackman4,blackman3,blackman2,sinc5,sinc4,sinc3,sinc2,gaussian,hamming,hanning,catrom,bell,hermite,mitchell,cubic(bicubic),lanczos5,lanczos4,lanczos3,lanczos2,box,linear(bilinear). The OpenCV methods now have the prefixcv2_and realistic kernels (ie. extracted with KernelGAN) can be combined during training. Note: Using PIL as the image backend currently only allows to use the PIL bicubic method to resize. - Extended OTF augmentations pipeline with second and final blur, additional compression steps, in-pipeline resizing and random shuffling flag, among others. The original pre-pipeline resizing strategy can still be used with
resize_strat: 'pre'and in-pipeline withresize_strat: 'in'. - Added special scaling cases:
down_upto randomize scales when resizing in-pipeline andnearest_alignedthat prevents the misalignment of 0.5 × (s − 1) pixels when using regular nearest neighbor downscaling. Additionally, when used in-pipeline, randomup/down/keepprobabilities and scale ranges can be defined, with adaptive scale in case images are already in the final target scale. - Added option to select what kernel scale to use for realistic option.
- Updated the
imresizefunction with a new version adapted from ResizeRight to solve the incorrectness issues from standard frameworks (OpenCV, PIL, Torch, TensorFlow, others). - Added the Adaptive Target Generator (ATG) and strategy from AdaTarget to spatially match the network outputs to target images. More information in AdaTarget. Option:
use_atg. - Added option to use Pixel-Unshuffle to fully replicate the Cutblur batch augmentation strategy and as an option to convert the input images to lower resolution and extended depth, reducing the resources required to train them. Option:
use_unshuffle. - Included a flag to select the image pre-processing option that is more adequate for the model training task (necessary for image to image translation options). Besides the default
cropto crop patches for super-resolution, denoising and deblurring, now there are alsoresize,scale_width,scale_height,scale_shortsideandfixedfor image to image translation and other cases. Can also be combined using_and_as a connector (for example:resize_and_crop). Can also usecenter_cropto first extract a center crop. (option:preprocess, must coordinate withload_size,center_crop_size,aspect_ratio). - Better logic to automatically save and load an arbitrary number of networks during training by defining them in the
model_namesobject in the initialization of model files. - Added AdamP, SGDP, MADGRAD and Ranger optimizers as alternative to the ones included in PyTorch (Adam, SGD, RMSprop).
- Added
FlatCosineDecayscheduler to complement the Ranger optimizer strategy (similar to linear decay after fixed lr rate, but with a cosine function). - Fully integrated the
ReduceLROnPlateauscheduler with the metrics builded, so it can be used by selecting the metric to use for evaluation from the ones computed during validation step. Note:nllis an optional metric that can be used when training SRFlow (selectminas theplateau_mode). - Added
multiscalepixel loss and experimentalcolor,average,fft,overflowandrangelosses. - Added pix2pix and CycleGAN networks and training strategies. Besides the options available in the original code (image pool for discriminator, conditional GAN training, etc) can also use the discriminators, losses and other functions available here, such as
AMP, frequency separation, differential augmentations and others. Note: the originalABpaired datasets frompix2pixcan can be used and will be automatically split into theAinput domain and theBtarget domain, by using theoutputs: ABanddataroot_ABoption and path in the options file. - Added the white-box cartoonization (WBC) training strategy for unpaired photo to cartoon images. Same additions as
pix2pix/CycleGAN, plus fine-grained losses configuration for each image representation. - Use of configuration presets and default values injection, instead of editing full configuration files. Augmentations presets are defined in blur, resize and noise files, with a base default configuration and additional overlays (can be overriden with the main options file). Networks have pre-flight configuration check. Currently optimizers and losses manage defaults on their own, move to
defaults.pyTBD. - Added presets for
Real-SR,BSRGANandReal-ESRGAN(rESRGAN) training strategies. - Added option to select what KernelGAN extratcted kernel scale to use during training (
realk_scale). - Updated the (
VGG) feature extractor network, by merging the originalVGGextractor and the one used bycontextual loss. Now has new options like changing the first convolution padding to reduce edge artifacts, remove pooling layers, changeMaxPool2dstrides, and option to set the network to trainable. - Upgraded the
FeatureLossfunction, enabled configuration of independent random paired rotations and flips, and included the style loss withgram matrixin the same function to reuse the feature maps (Note: currently Style loss will produceNaNif usingAMP). The feature extractor andFeatureLossconfiguration can be controlled fromperceptual_optin the options file. - Added
Frobeniusnorm loss, mainly for feature loss, but can be used as a regular content/pixel loss (fro). - Updated the
TVregularization for better logic and configurable output normalization to match different implementations. (configuration via options file TBD) - Added option to use standard
GANformulation (besides relativisticGANformulation), configurable withgan_optin the options file. - Added a
unetdiscriminator alternative, with optional spectral normalization option. - Added features extraction/matching for
patchganandmultiscale patchgandiscriminators, to be used as a feature loss, equivalent todiscriminator_vgg_128_feaanddiscriminator_vgg_fea. Note that there's an additional option required to enable, underwhich_model_D, setget_feats: true. - Renamed project to
traiNNerfor simpler understandability of the repository function, similar toiNNferfor inference andaugmeNNtfor the augmentations. - Added option to use gradient clipping. Clipping by both value (
clip_grad_value_) and norm (clip_grad_norm_) is supported. The clipping value can be provided or in thenormcase,autocan be used to automatically calculate the clipping value. Options:grad_clipandgrad_clip_value.
- Add automatic loading models of either the original ESRGAN architecture or the modified one.
- Large rewrite of code was made to reduce code redundancy and duplicates, reorganize the code and make it more modular.
- The filters and image manipulations used by the different functions (
HFEN,SSIM/MS-SSIM,SPL,TV/DTV, etc) are now consolidated in filters.py and colors.py. - Reusable loss builder/factory to reduce the changes needed when using a new model and adding new losses only once for all models.
- Metrics builder to include only the selected ones during validation.
- Integrated Automatic Mixed Precision (AMP). (Code updated to work with Pytorch >= 1.6.0 and 1.3.0). Option
use_amp. Contextual Loss(CX, CX). Option:cx_type.Differential Augmentationsfor efficient gan training (Paper). Option:diffaug.- Batch augmentations (based on Cutblur). Option:
mixup. ESRGAN+improvements to the ESRGAN network (ESRGAN+). Options:gaussianandplus.- Adapted
frequency filteringper loss function (Reference). In general, content losses receive the low-frequency images, feature losses non-filtered images and the discriminator high-frequency images. Option:fs. - Added on the fly use of realistic image kernels extracted with KernelGAN (Paper and injection of noise extracted from real images patches (Reference).
- Enabled option to use the feature maps from the VGG-like discriminator in training for feature similarity (Reference). Option:
discriminator_vgg_128_fea. - PatchGAN option for the discriminator (Reference). Option:
patchgan. - Multiscale PatchGAN option for the discriminator (Reference). Option:
multiscale. - Added a modified Pixel Attention Network for Efficient Image Super-Resolution (PAN), which includes a self-attention layer in the residual path, among other changes. A basic pretrained model for 4x scale can be found here.
- Stochastic Weight Averaging (SWA, Pytorch) added as an option. Currently the change only applies to the generator network, changing the original learning rate scheduler to the SWA scheduler after a defined number of iterations have passed (the original paper refers to the later 25% part of training). The resulting
SWAmodel can be converted to a regular model after training using the scripts/swa2normal.py script. Optionuse_swaand configure the swa scheduler. - Migrate all main on the fly augmnetations to a new repository (augmennt) and initiate change to use openCV-based composable transformation for augmentations with a new dataloader.
- Added the basic idea behind "Freeze Discriminator: A Simple Baseline for Fine-tuning GANs" (FreezeD) to accelerate training with transfer learning. It is possible to use a pretrained discriminator model and freeze the initial (bottom) X number of layers. Option:
freeze_loc, enabled for any of the VGG-like discriminators or patchgan (multiscale patchgan not yet added). - Integrated the Consistency Enforcing Module (
CEM) from Explorable Super Resolution (Paper, Web). Available both for use during inference, as well as during training (only using a default downsampling kernel ATM). Can be easily extended to use estimaged Kernels from the images for downscaling usingKernelGANfrom DLIP. More information on CEM here. - Added the training and testing codes for Super-Resolution using Normalizing Flow in PyTorch (SRFlow models (including the GLOW reference code). A starter pretrained model can be found here, which used a model based on the original ESRGAN architecture for the RRDB module (it is necessary to use it to later be able to test model interpolations). Otherwise, the original SRFlow model used the modified ESRGAN pretrained model that can also be used.
- Other changes: added graceful interruption of training to continue from where it was interrupted, virtual batch option,
strictmodel loading flag, support for using YAML or JSON options files, color transfer script (color_transfer.py) with multiple algorithms to transfer image statistics (colors) from a reference image to another, integrated theforward_chopfunction into the SR model to crop images into patches before upscaling for inference in VRAM constrained systems (use option test_mode: chop), general fixes and code refactoring.
- Video network for optical flow and video super-resolution (SOFVSR). Pretrained model using 3 frames, trained on a subset of
REDSdataset here. - Added option to use different image upscaling networks with the HR optical flow estimation for video (Pretrained using 3 frames and default ESRGAN as SR network here).
- Initial integration of
RIFE(Paper) architecture for Video Frame Interpolation (Converted trained model from three pickle files into a single pth model here). - Video ESRGAN (
EVSRGAN) andSR3Dnetworks using 3D convolution for video super-resolution, inspired on "3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks" (Paper). EVSRGAN Pretrained using 3 frames and default arch options here. - Real-time Deep Video Deinterlacing (Paper) training and testing codes implemented. Pretrained DVD models can be found here.
- On the fly (
OTF) augmentations pipeline (gaussian/s&p/speckle/Poisson/dither noise, gaussian/box/average blur, JPEG compression, quantization based on self-organizing maps, others), with fully randomized application. - Total Variation (
TV) regularization options. Useful for denoising tasks. - High-frequency error norm (HFEN) loss. Useful to keep high frequency information.
SSIMandMS-SSIMimage structure loss functions.- Alternative content/pixel based loss functions:
Charbonnier,Elastic,RelativeL1,L1CosineSim. - Experimental Spatial Profile Loss (SPL).
- Partial Convolution based Padding (PartialConv2D). It should help prevent edge padding issues. Zero padding is the default and typically has best performance, PartialConv2D has better performance and converges faster for segmentation and classification (https://arxiv.org/pdf/1811.11718.pdf). Code has been added, but the switch makes pretained models using Conv2D incompatible. Test for inpainting and denoising.
- Initial implementation of the
PPONtraining, based on the original published paper. Including Multiscale L1 loss in phase 2. - Update
ESRGANtraining model (SRRAGAN, nowsrfor super-resolution and restoration) - Implement extraction of features from the discriminator to be used as a feature loss, as an alternative to classification networks like
VGG(similar toSRPGAN).
- Use of the Relativistic GAN GAN training strategy.
- Original BasicSR codes, used to train ESRGAN.