feat: add blazeface support#1187
Conversation
| /// Affine transform from model-input pixel coords back to source-image coords: | ||
| /// `x_src = x_model * scaleX + offsetX`. Covers both plain stretch (offsets | ||
| /// zero) and letterbox (offsets carry the centre-pad). | ||
| struct BoxTransform { |
There was a problem hiding this comment.
Maybe move it to utils? This might be used in other model pipes as well.
| @@ -86,34 +87,37 @@ std::vector<types::Instance> BaseInstanceSegmentation::runInference( | |||
| auto instances = collectInstances( | |||
| forwardResult.get(), originalSize, modelInputSize, confidenceThreshold, | |||
| classIndices, returnMaskAtOriginalResolution); | |||
| return finalizeInstances(std::move(instances), iouThreshold, maxInstances); | |||
| return finalizeInstances(std::move(instances), iouThreshold, maxInstances, | |||
| useWeightedNms); | |||
| } | |||
|
|
|||
| std::vector<types::Instance> BaseInstanceSegmentation::generateFromString( | |||
| std::string imageSource, double confidenceThreshold, double iouThreshold, | |||
| int32_t maxInstances, std::vector<int32_t> classIndices, | |||
| bool returnMaskAtOriginalResolution, std::string methodName) { | |||
| bool returnMaskAtOriginalResolution, std::string methodName, | |||
| bool useWeightedNms) { | |||
|
|
|||
| cv::Mat imageBGR = image_processing::readImage(imageSource); | |||
| cv::Mat imageRGB; | |||
| cv::cvtColor(imageBGR, imageRGB, cv::COLOR_BGR2RGB); | |||
|
|
|||
| return runInference(imageRGB, confidenceThreshold, iouThreshold, maxInstances, | |||
| classIndices, returnMaskAtOriginalResolution, methodName); | |||
| classIndices, returnMaskAtOriginalResolution, methodName, | |||
| useWeightedNms); | |||
| } | |||
|
|
|||
| std::vector<types::Instance> BaseInstanceSegmentation::generateFromFrame( | |||
| jsi::Runtime &runtime, const jsi::Value &frameData, | |||
| double confidenceThreshold, double iouThreshold, int32_t maxInstances, | |||
| std::vector<int32_t> classIndices, bool returnMaskAtOriginalResolution, | |||
| std::string methodName) { | |||
| std::string methodName, bool useWeightedNms) { | |||
|
|
|||
| auto orient = ::rnexecutorch::utils::readFrameOrientation(runtime, frameData); | |||
| cv::Mat frame = extractFromFrame(runtime, frameData); | |||
| cv::Mat rotated = utils::rotateFrameForModel(frame, orient); | |||
| auto instances = | |||
| runInference(rotated, confidenceThreshold, iouThreshold, maxInstances, | |||
| classIndices, returnMaskAtOriginalResolution, methodName); | |||
| auto instances = runInference( | |||
| rotated, confidenceThreshold, iouThreshold, maxInstances, classIndices, | |||
| returnMaskAtOriginalResolution, methodName, useWeightedNms); | |||
| for (auto &inst : instances) { | |||
| utils::inverseRotateBbox(inst.bbox, orient, rotated.size()); | |||
| // Inverse-rotate the mask to match the screen orientation | |||
| @@ -131,11 +135,13 @@ std::vector<types::Instance> BaseInstanceSegmentation::generateFromFrame( | |||
| std::vector<types::Instance> BaseInstanceSegmentation::generateFromPixels( | |||
| JSTensorViewIn tensorView, double confidenceThreshold, double iouThreshold, | |||
| int32_t maxInstances, std::vector<int32_t> classIndices, | |||
| bool returnMaskAtOriginalResolution, std::string methodName) { | |||
| bool returnMaskAtOriginalResolution, std::string methodName, | |||
| bool useWeightedNms) { | |||
|
|
|||
| cv::Mat image = extractFromPixels(tensorView); | |||
| return runInference(image, confidenceThreshold, iouThreshold, maxInstances, | |||
| classIndices, returnMaskAtOriginalResolution, methodName); | |||
There was a problem hiding this comment.
We are modifying the code that should be general (BASE instance segmentation) and we bloat it with some positional parameters that are useless for other models. This is super bad architectural design what we have right now. We need to discuss this internally how we should tackle that. Because this is not the problem of this PR only, but almost all models. I created RFC in discussion section how to deal with this one. See #1189
There was a problem hiding this comment.
I feel like we already agreed internally on the underlying issue, I drafted an internal design doc on this ~2 months ago proposing pretty much the same fix: ordered preprocessor pipeline from JS, (postprocessing can be a bit trickier) new models added via config rather than new C++. We just didn't really spend time to sit down and implement it.
For this PR: extending BaseInstanceSegmentation was symmetry with ObjectDetection, not because I think the pattern is right. Once we commit to the pipeline design (especially on the native side, as the TS partially does it) useWeightedNms collapses into a postprocessor field anyway.
Is this blocking for this PR? I'm happy to take on the refactor, since we're past the release, but I don't feel like it's the scope of this PR.
There was a problem hiding this comment.
It is not a scope of this PR for sure, but yeah, I would consider it a blocking. The refactor should be top priority now. We should discuss this internally how we should proceed with the refactor: either direct PRs to main or PRs to dedicated branch and then one to main. Also imo, we need to have the file with architecture outline in the repository, not in internal resources. We can start by pushing markdown with architectural outline abstracted from design docs.
It is better to start with refactor of instance segmentation and object detection, and test on this model if addition of brand new model goes smoothly rather than merging this and refactor immediately after.
Description
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Screenshots
Related issues
Checklist
Additional notes