sklearn instrumentation package#1054
Conversation
39a8af0 to
342c88c
Compare
342c88c to
1f4cb32
Compare
|
hello and welcome @crflynn! Please sign the CLA, I'll review afterwards. |
instrumentation/opentelemetry-instrumentation-sklearn/CHANGELOG.md
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
|
I'm not sure how to get the docs to build. Sphinx doesn't seem to want to cooperate with the type hints I've provided. I have a |
ocelotl
left a comment
There was a problem hiding this comment.
I'm not sure how to get the docs to build. Sphinx doesn't seem to want to cooperate with the type hints I've provided. I have a
nitpick_ignorefor the sklearn BaseEstimator but that seems to not be enough.
I looked into this issue, apparently sphinx can't find a class for some reason. It seems like a similar issue as this one, which has a workaround here. Nevertheless, I tried it and it did not solve the documentation problem. Apparently the root cause between this issue is not the same as the one the workaround is for, since sklearn.base.BaseEstimator is not being imported into any other module in the sklearn for Sphinx to import from this new location. Will look further into this.
|
|
||
| class TestSklearn(TestBase): | ||
| def test_package_instrumentation(self): | ||
| ski = SklearnInstrumentor(packages=["sklearn"]) |
There was a problem hiding this comment.
Is it necessary to pass ["sklearn"] to the constructor here? I am under the impression that this is done by default because of this.
There was a problem hiding this comment.
It's not necessary, no. I can remove it.
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
|
|
||
| The base class' new method passes args and kwargs. We override because | ||
| we init the class with configuration and Python raises TypeError when | ||
| additional arguments are passed to the object.__new__() method. |
There was a problem hiding this comment.
This seems like an issue in the base class implementation of the singleton mechanism, actually... will have to look into this.
| for method_name in self.methods: | ||
| if hasattr(klass, method_name): | ||
| self._instrument_class_method( | ||
| estimator=klass, method_name=method_name |
There was a problem hiding this comment.
Just a concern, what if there is another instrumentation also installed for the same package that is passed as an argument for this instrumentor? Would that cause double instrumentation?
There was a problem hiding this comment.
The only way it wouldn't is if that other instrumentation package used the same strategy, i.e. resulted in a True return value from the _check_instrumented method. Order of instrumentation might also matter in that regard.
Related, one of the things I tried to do here was abstract the span decorator as an instrumentor arg with the idea that multiple instrumentations (say otel + datadog) could be applied to estimators by just instantiating multiple instrumentors with different decorators. However, this isn't feasible with the current code because of how instrumentation is applied with the _original_xxx attributes and the _check_instrumented method.
After iterating for a few days I'm a bit stuck on instrumenting the library. Particularly I'm having a hard time patching methods which are class attributes (rather than instance attributes) because of the if_delegate_has_method decorator which exists on some metaestimators. This decorator acts as a conditional property which delegates if and only specific instance attributes exist. The problem is here where I could omit these methods, as I do for properties, but I believe spanning these methods is important because they delegate to other internal estimators and it would obfuscate some of the model hierarchy if they were missing. |
|
Hello! I have a PR to move some files you have in this PR to the Contrib repo, please let me know if this gets merged before the PR in the Contrib repo. Please see https://github.com/open-telemetry/opentelemetry-python-contrib/pulls/ |
Is there value in having the per model instrumentation way have all the methods patched, but having the whole package instrumentation omit some of these methods? We can just update our documentations to reflect this. This way at least we have functionality for both. |
d90763f to
de241d3
Compare
|
I've got a solution for the autoinstrumentation delegation problems in the latest push, which passes the tests locally. It seems though that
I think this is a lot closer now. Let me know how we should go from here. |
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
lzchen
left a comment
There was a problem hiding this comment.
Nice! Thanks for the contrib.
|
This PR probably going to be affected the migration. |
|
Should I just move this over to the contrib repo? |
|
@crflynn Yes please! That would be super helpful. The contrib repo is ready to accept new PRs like these :) |
|
@NathanielRN I'll work on that later today |
|
@ocelotl can you review again, would like to get this merged |

Description
Provides an opentelemetry instrumentation package for sklearn models, instrumenting internal spans at the estimator level. The motivation is to provide observability into machine learning models that run for realtime predictive applications that have many complex transformers and predictors.
The instrumentor adds spans to sklearn estimators according to a set of default estimator methods (namely
fit,predict,predict_probaandtransform) and other configuration parameters that determine how spans are implemented through the model hierarchy. The default configuration also handlesPipelineandFeatureUnionhierarchies. Since sklearn's API is easily extended, the configuration parameters allow for custom model hierarchy traversal, allowing spans to be implemented in custom estimators as well.Type of change
How Has This Been Tested?
The package provides two tests for the implementation.
test_span_propertiesuses an sklearn model fixture and asserts span names, kinds, and parent-child relationships.test_attrib_configuses the same fixture to assert implementation of non-default configuration parameters.I also have an example implementation here: https://github.com/crflynn/opentelemetry-sklearn
Checklist: