Skip to content

feat: add blazeface support#1187

Open
chmjkb wants to merge 5 commits into
mainfrom
@chmjkb/blazeface
Open

feat: add blazeface support#1187
chmjkb wants to merge 5 commits into
mainfrom
@chmjkb/blazeface

Conversation

@chmjkb
Copy link
Copy Markdown
Collaborator

@chmjkb chmjkb commented May 26, 2026

Description

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@chmjkb chmjkb marked this pull request as draft May 26, 2026 08:41
@chmjkb chmjkb force-pushed the @chmjkb/blazeface branch from cb4233a to 9809733 Compare May 26, 2026 08:42
@chmjkb chmjkb force-pushed the @chmjkb/blazeface branch from 9809733 to b3fcb85 Compare May 26, 2026 09:12
@chmjkb chmjkb marked this pull request as ready for review May 26, 2026 12:42
@chmjkb chmjkb requested a review from benITo47 May 26, 2026 12:42
/// Affine transform from model-input pixel coords back to source-image coords:
/// `x_src = x_model * scaleX + offsetX`. Covers both plain stretch (offsets
/// zero) and letterbox (offsets carry the centre-pad).
struct BoxTransform {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move it to utils? This might be used in other model pipes as well.

Comment on lines 59 to -138
@@ -86,34 +87,37 @@ std::vector<types::Instance> BaseInstanceSegmentation::runInference(
auto instances = collectInstances(
forwardResult.get(), originalSize, modelInputSize, confidenceThreshold,
classIndices, returnMaskAtOriginalResolution);
return finalizeInstances(std::move(instances), iouThreshold, maxInstances);
return finalizeInstances(std::move(instances), iouThreshold, maxInstances,
useWeightedNms);
}

std::vector<types::Instance> BaseInstanceSegmentation::generateFromString(
std::string imageSource, double confidenceThreshold, double iouThreshold,
int32_t maxInstances, std::vector<int32_t> classIndices,
bool returnMaskAtOriginalResolution, std::string methodName) {
bool returnMaskAtOriginalResolution, std::string methodName,
bool useWeightedNms) {

cv::Mat imageBGR = image_processing::readImage(imageSource);
cv::Mat imageRGB;
cv::cvtColor(imageBGR, imageRGB, cv::COLOR_BGR2RGB);

return runInference(imageRGB, confidenceThreshold, iouThreshold, maxInstances,
classIndices, returnMaskAtOriginalResolution, methodName);
classIndices, returnMaskAtOriginalResolution, methodName,
useWeightedNms);
}

std::vector<types::Instance> BaseInstanceSegmentation::generateFromFrame(
jsi::Runtime &runtime, const jsi::Value &frameData,
double confidenceThreshold, double iouThreshold, int32_t maxInstances,
std::vector<int32_t> classIndices, bool returnMaskAtOriginalResolution,
std::string methodName) {
std::string methodName, bool useWeightedNms) {

auto orient = ::rnexecutorch::utils::readFrameOrientation(runtime, frameData);
cv::Mat frame = extractFromFrame(runtime, frameData);
cv::Mat rotated = utils::rotateFrameForModel(frame, orient);
auto instances =
runInference(rotated, confidenceThreshold, iouThreshold, maxInstances,
classIndices, returnMaskAtOriginalResolution, methodName);
auto instances = runInference(
rotated, confidenceThreshold, iouThreshold, maxInstances, classIndices,
returnMaskAtOriginalResolution, methodName, useWeightedNms);
for (auto &inst : instances) {
utils::inverseRotateBbox(inst.bbox, orient, rotated.size());
// Inverse-rotate the mask to match the screen orientation
@@ -131,11 +135,13 @@ std::vector<types::Instance> BaseInstanceSegmentation::generateFromFrame(
std::vector<types::Instance> BaseInstanceSegmentation::generateFromPixels(
JSTensorViewIn tensorView, double confidenceThreshold, double iouThreshold,
int32_t maxInstances, std::vector<int32_t> classIndices,
bool returnMaskAtOriginalResolution, std::string methodName) {
bool returnMaskAtOriginalResolution, std::string methodName,
bool useWeightedNms) {

cv::Mat image = extractFromPixels(tensorView);
return runInference(image, confidenceThreshold, iouThreshold, maxInstances,
classIndices, returnMaskAtOriginalResolution, methodName);
Copy link
Copy Markdown
Member

@msluszniak msluszniak May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are modifying the code that should be general (BASE instance segmentation) and we bloat it with some positional parameters that are useless for other models. This is super bad architectural design what we have right now. We need to discuss this internally how we should tackle that. Because this is not the problem of this PR only, but almost all models. I created RFC in discussion section how to deal with this one. See #1189

Copy link
Copy Markdown
Collaborator Author

@chmjkb chmjkb May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we already agreed internally on the underlying issue, I drafted an internal design doc on this ~2 months ago proposing pretty much the same fix: ordered preprocessor pipeline from JS, (postprocessing can be a bit trickier) new models added via config rather than new C++. We just didn't really spend time to sit down and implement it.

For this PR: extending BaseInstanceSegmentation was symmetry with ObjectDetection, not because I think the pattern is right. Once we commit to the pipeline design (especially on the native side, as the TS partially does it) useWeightedNms collapses into a postprocessor field anyway.

Is this blocking for this PR? I'm happy to take on the refactor, since we're past the release, but I don't feel like it's the scope of this PR.

Copy link
Copy Markdown
Member

@msluszniak msluszniak May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a scope of this PR for sure, but yeah, I would consider it a blocking. The refactor should be top priority now. We should discuss this internally how we should proceed with the refactor: either direct PRs to main or PRs to dedicated branch and then one to main. Also imo, we need to have the file with architecture outline in the repository, not in internal resources. We can start by pushing markdown with architectural outline abstracted from design docs.

It is better to start with refactor of instance segmentation and object detection, and test on this model if addition of brand new model goes smoothly rather than merging this and refactor immediately after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants