Disassociate RT membership from connectivity#50
Conversation
daa33aa to
ae0bcf5
Compare
table.go
Outdated
| // NewRoutingTable creates a new routing table with a given bucketsize, local ID, and latency tolerance. | ||
| func NewRoutingTable(bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics) *RoutingTable { | ||
| // Passing a nil PeerValidationFnc disables periodic table cleanup. | ||
| func NewRoutingTable(ctx context.Context, bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics, |
There was a problem hiding this comment.
Just curious; why not make the peerValidationFnc one of the Options? We could potentially convert the other args (latency, etc) into Options as well, as long as we're changing the signature.
There was a problem hiding this comment.
Using contexts to stop services is really an anti-pattern (yeah, I know we use it everywhere...). We should provide a close function.
Otherwise, the routing table could stop before the DHT (context cancellation isn't ordered).
There was a problem hiding this comment.
@Stebalien Could you elaborate a bit on this ? Why is it an anti-pattern ? Also, could you direct me to some code which uses a Close function like you mention ?
There was a problem hiding this comment.
Context cancellation is designed for requests: https://golang.org/pkg/context/.
- Previous answer: Simulate DONT_HAVE for older peers ipfs/go-bitswap#248 (comment)
- Lots of discussion here: Discussion: Adopting a pattern/library for managing long lived go-routines ipfs/kubo#5810
Basically, if we use contexts, everything will shutdown at the same time in a random order without taking dependencies into account. If we use explicit close methods, we can close a service and then have it close the sub-components it owns, etc.
We've had problems in the past where we'd, e.g., close the datastore before we're done using it.
There was a problem hiding this comment.
@Stebalien Thanks for this. I've made the change. Also dug through the goprocess library for this & now have a pretty good idea of how it works internally.
There was a problem hiding this comment.
Personally speaking I don't think using context this way is bad. goprocess is the antipattern, not context + cancellation. Additionally goprocess turns non-trivial goroutines into relatively speaking expensive operations.
table.go
Outdated
| // NewRoutingTable creates a new routing table with a given bucketsize, local ID, and latency tolerance. | ||
| func NewRoutingTable(bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics) *RoutingTable { | ||
| // Passing a nil PeerValidationFnc disables periodic table cleanup. | ||
| func NewRoutingTable(ctx context.Context, bucketsize int, localID ID, latency time.Duration, m peerstore.Metrics, |
There was a problem hiding this comment.
Using contexts to stop services is really an anti-pattern (yeah, I know we use it everywhere...). We should provide a close function.
Otherwise, the routing table could stop before the DHT (context cancellation isn't ordered).
|
@yusefnapora @Stebalien Have made the changes save one, where I'd like some more information on why using contexts for service cancellation is an anti-pattern. Please take a look when you can. |
|
@Stebalien I believe other than the one comment where we are waiting for @raulk , all your comments have been addressed. Let me know if you have any additional concerns. |
Stebalien
left a comment
There was a problem hiding this comment.
This mostly looks right but I'm not awake enough at the moment to give a proper review. However, I do have a question.
|
|
ping @Stebalien |
Eh, that's probably not an issue for now. We're not going for speed here, we're trying to do a background cleanup. I guess we could be too slow, but that's probably not the end of the world. Regardless, we should set a pretty short timeout.
Yes, but let's do that in a followup patch. The current code should be "good enough" and there are some tricky edge-cases here (eg., don't reconnect to the same overloaded peer every 10 seconds). |
| } | ||
| log.Infof("failed to validated candidate=%s", c) | ||
| // remove candidate | ||
| rt.HandlePeerDead(c) |
There was a problem hiding this comment.
Is this actually necessary? We've already removed it from the queue and it shouldn't be in the table. I guess it can't hurt.
There was a problem hiding this comment.
It's necessary as we remove it from the replacement cache as part of HandlePeerDead .
|
(merge when ready) |
@raulk @Stebalien @aschmahmann @dirkmc
For libp2p/go-libp2p-kad-dht#283
1. Address management is not a part of this PR & will be next.
2. DHT changes will be made once this goes in.
3. IMO, moving the RT inside
go-libp2p-kad-dhtis a bigger refactoring that will involve multiple discussions and should be done as a separate stream of work.