discovery

command module
v0.0.0-...-0b177bf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2021 License: BSD-3-Clause Imports: 37 Imported by: 0

README

Content Discovery

This plan evaluates different strategies for content discovery. For more information on how to run it please follow instructions on the testplans readme.

The main solution is currently based on gossip messaging though we may add other solutions in the future for different use cases.

Global parameters

Groups

In this plan we work with different node groups to replicate real world scenarios we expect to encounter in the Myel network. These groups are defined in regards to the session we run in our tesplan.

  • Nodes in the clients group make queries for content during the lifecycle of our session.
  • Nodes in the providers group store the content requested by clients and reply to client queries.
  • Nodes in the bystanders group may be client or provider nodes though aren't actively querying for content or storing the content clients in our session are looking for. Since all nodes in the network won't be storing or requesting the same content we assume a majority of node are bystanders to a given session.
Connections

The way peers are connected in the network heavily influences the performance of discovery sessions. To simulate real world network topologies we use a combination of bootstrapping by connecting some nodes directly and letting the DHT randomly distribute connections.

  • bootstrap defines the number of nodes we directly connect with when setting up our node. A higher number of bootstrap nodes means our network will be more densely connected in a shorter time frame.
  • min_conns is the minimum of connection our node routing should strive to reach with the DHT.
  • max_conns is the amount after which our node will start pruning connections. Hence we expect the number of nodes connected to reach some number between min and max conns.
Traffic Shaping

We assume peers will have different network conditions and randomly generate parameters between a minimum and maximum.

  • latency is generated between a given min and max. We use values expected to be encountered in a relatively close geographic area i.e. Europe as our use case optimizes for localized usage.
  • bandwidth is generated based on average bandwidth for our target user segment, end users of recent and mid to high performance devices. We may try on lower latency connections to get and idea as well.
  • jitter is generated based on typical values encountered in real world situations in which users maybe using more brittle wireless connections over Ethernet connected servers.

Gossip

In these experiments we compare 2 different implementations, one (address broadcast) in which the publisher network address is sent along in the message and the recipient streams the response back directly and the other (recursive forwarding) in which each peer in the gossip transmission chain forwards back to the previous sender.

Low Content Replication

This composition evaluates the performance of the network with low to no content replication over a large network.

Results

Ubuntu AMD Ryzen 9 3900XT 12-Core Processor - 64GiB DDR4

Typical time to first offer: mean ± standard deviations (24 samples)

Solution 10 Instances (ms) 20 Instances (ms) 30 Instances (ms) 40 Instances (ms)
gossip - AB 825 ±104 1337 ±1071 2111 ±1376 2081 ±1275
gossip - RF 1014 ±344 1187 ±447 1186 ±442 1445 ±457
Interpretation

As the size of the network grows, the client becomes less likely to be directly connected to ther provider of the content it's looking for. Dialing the provider incurs some overhead which can reduce the speed of query by 2-3X. As a result, the standard devation shows the very large difference between queries in which client and provider are directly connected vs when they're not. When forwarding back to the previous sender we reuse existing connections so the speed depends on the number of hops between peers and makes the speed more consistant.

High Content Replication

This composition evaluates the performance of the network with high content replication over a large network.

Results

Ubuntu AMD Ryzen 9 3900XT 12-Core Processor - 64GiB DDR4

Typical time to first offer: mean ± standard deviations (24 samples)

Solution 10 Instances (ms) 20 Instances (ms) 30 Instances (ms) 40 Instances (ms)
gossip - AB 697 ±25 720 ±40 844 ±499 776 ±101
gossip - RF 686 ±24 714 ±25 817 ±234 906 ±333
Interpretation

A higher content replication means a higher likelihood that the client is directly connected with a provider hence a majority of queries are very quick to execute even with a high latency and jitter. Using recursive forwarding yields very similar speed.

Network Segmentation By Region

This composition demonstrates the segmentation of the gossip network into different topics based on geographic regions. Messages are only published to a topic in which the subscribers have relatively similar latency.

Results

Ubuntu AMD Ryzen 9 3900XT 12-Core Processor - 64GiB DDR4

Typical time to first offer: mean ± standard deviations (24 samples)

Solution 10 Instances (ms) 20 Instances (ms) 30 Instances (ms) 40 Instances (ms)
gossip - AB 96 ±16 132 ±129 125 ±31 240 ±252
gossip - RF 148 ±87 143 ±51 136 ±50 168 ±78
Interpretation

Since the message is only published to peers at the lowest latency in the network, propagation is extremely fast. Even if peers are likely to dial each other, when they are very close to each other this extra step doesn't impact the speed as significantly as with peers with higher latency. RF looks very similar in speed though significantly more consistant across samples.

Network Segmentation By Content

This composition demonstrates the segmentation of the gossip network by type of content. I.e. an application has created its own subnetwork in which clients can query the topic to find their content. Currently the discovery session can only publish to a region topic so we use region topics to test but we include peers with different latency as opposed to the previous test.

Results

Ubuntu AMD Ryzen 9 3900XT 12-Core Processor - 64GiB DDR4

Typical time to first offer: mean ± standard deviations (24 samples)

Solution 10 Instances (ms) 20 Instances (ms) 30 Instances (ms) 40 Instances (ms)
gossip - AB 707 ±29 713 ±36 1076 ±881 1552 ±1206
gossip - RF 708 ±28 752 ±146 863 ±267 846 ±258
Interpretation

Segmenting the network into non-geographic topics does not seem to improve performance at this scale. This is because peers with different latencies can subscribe to the same topic thus it does not guarrantee peers will be nearby. RF appears to offer better performance in these conditions, might be because smaller amount of subscribers means less hops though it is very likely peers won't be directly connected.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL