rfc9616v2.txt   rfc9616.txt 
skipping to change at line 73 skipping to change at line 73
7. IANA Considerations 7. IANA Considerations
8. Security Considerations 8. Security Considerations
9. References 9. References
9.1. Normative References 9.1. Normative References
9.2. Informative References 9.2. Informative References
Acknowledgements Acknowledgements
Authors' Addresses Authors' Addresses
1. Introduction 1. Introduction
"The Babel Routing Protocol" [RFC8966] does not mandate a specific The Babel routing protocol [RFC8966] does not mandate a specific
algorithm for computing metrics; existing implementations use a algorithm for computing metrics; existing implementations use a
packet-loss-based metric on wireless links and a simple hop-count packet-loss-based metric on wireless links and a simple hop-count
metric on all other types of links. While this strategy works metric on all other types of links. While this strategy works
reasonably well in many networks, it fails to select reasonable reasonably well in many networks, it fails to select reasonable
routes in some topologies involving tunnels or VPNs. routes in some topologies involving tunnels or VPNs.
+------------+ +------------+
| A (Paris) +---------------+ | A (Paris) +---------------+
+------------+ \ +------------+ \
/ \ / \
skipping to change at line 99 skipping to change at line 99
\ / \ /
\ / \ /
\ / \ /
+------------+ / +------------+ /
| D (Paris) +---------------+ | D (Paris) +---------------+
+------------+ +------------+
Figure 1: Four Routers in a Diamond Topology Figure 1: Four Routers in a Diamond Topology
For example, consider the topology described in Figure 1, with three For example, consider the topology described in Figure 1, with three
routers (A, B, and D) located in Paris and a fourth router (C) routers A, B, and D located in Paris and a fourth router C located in
located in Tokyo, connected through tunnels in a diamond topology. Tokyo, connected through tunnels in a diamond topology. When routing
When routing traffic from A to D, it is obviously preferable to use traffic from A to D, it is obviously preferable to use the local
the local route through B as this is likely to provide better service route through B as this is likely to provide better service quality
quality and lower monetary cost than the distant route through C. and lower monetary cost than the distant route through C. However,
However, the existing implementations of Babel consider both routes the existing implementations of Babel consider both routes as having
as having the same metric; therefore, they will route the traffic the same metric; therefore, they will route the traffic through C in
through C in roughly half the cases. roughly half the cases.
In the first part of this document (Section 3), we specify an In the first part of this document (Section 3), we specify an
extension to the Babel routing protocol that produces a sequence of extension to the Babel routing protocol that produces a sequence of
accurate measurements of the round-trip time (RTT) between two Babel accurate measurements of the round-trip time (RTT) between two Babel
neighbours. These measurements are not directly usable as an input neighbours. These measurements are not directly usable as an input
to Babel's route-selection procedure since they tend to be noisy and to Babel's route selection procedure since they tend to be noisy and
to cause a negative feedback loop, which might give rise to frequent to cause a negative feedback loop, which might give rise to frequent
oscillations. In the second part (Section 4), we define an algorithm oscillations. In the second part (Section 4), we define an algorithm
that maps the sequence of RTT samples to a link cost that can be used that maps the sequence of RTT samples to a link cost that can be used
for route selection. for route selection.
1.1. Applicability 1.1. Applicability
The extension defined in Section 3 provides a sequence of accurate, The extension defined in Section 3 provides a sequence of accurate
but potentially noisy, RTT samples. Since the RTT is a symmetric but potentially noisy RTT samples. Since the RTT is a symmetric
measure of delay, this protocol is only applicable in environments measure of delay, this protocol is only applicable in environments
where the symmetric delay is a good predictor of whether a link where the symmetric delay is a good predictor of whether a link
should be taken by routing traffic, which might not necessarily be should be taken by routing traffic, which might not necessarily be
the case in networks built over exotic link technologies. the case in networks built over exotic link technologies.
The extension makes minimal requirements on the nodes. In The extension makes minimal requirements on the nodes. In
particular, it does not assume synchronised clocks; it only requires particular, it does not assume synchronised clocks, and only requires
that clock drift be negligible during the time interval between two that clock drift be negligible during the time interval between two
Hello TLVs. Since that is on the order of a few seconds, this Hello TLVs. Since that is on the order of a few seconds, this
requirement is met even with cheap crystal oscillators, such as the requirement is met even with cheap crystal oscillators, such as the
ones used in consumer electronics. ones used in consumer electronics.
The algorithm defined in Section 4 is based on a number of The algorithm defined in Section 4 depends on a number of assumptions
assumptions about the network. The assumption with the most severe about the network. The assumption with the most severe consequences
consequences is that all links below a certain RTT (rtt-min in is that all links below a certain RTT (rtt-min in Section 4.2) can be
Section 4.2) can be grouped in a single category of "good" links. grouped in a single category of "good" links. While this is the case
While this is the case in wide-area overlay networks, it makes the in wide-area overlay networks, it makes the algorithm inapplicable in
algorithm inapplicable in networks where distinguishing between low- networks where distinguishing between low-latency links is important.
latency links is important.
There are other assumptions, but they are less likely to limit the There are other assumptions, but they are less likely to limit the
algorithm's applicability. The algorithm assumes that all links algorithm's applicability. The algorithm assumes that all links
above a certain RTT (rtt-max in Section 4.2) are equally bad, and above a certain RTT (rtt-max in Section 4.2) are equally bad, and
they will only be used as a last resort. In addition, in order to they will only be used as a last resort. In addition, in order to
avoid oscillations, the algorithm is designed to react slowly to RTT avoid oscillations, the algorithm is designed to react slowly to RTT
variations, thus causing suboptimal routing for seconds or even variations, thus causing suboptimal routing for seconds or even
minutes after an RTT change. While this is a desirable property in minutes after an RTT change; while this is a desirable property in
fixed networks, as it avoid excessive route oscillations, it might be fixed networks, as it avoid excessive route oscillations, it might be
an issue with networks with high rates of node mobility. an issue with networks with high rates of node mobility.
2. Specification of Requirements 2. Specification of Requirements
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
skipping to change at line 184 skipping to change at line 183
* the Origin Timestamp, a 32-bit timestamp (modulo 2^32) according * the Origin Timestamp, a 32-bit timestamp (modulo 2^32) according
to the neighbour's clock; to the neighbour's clock;
* the Receive Timestamp, a 32-bit timestamp (modulo 2^32) according * the Receive Timestamp, a 32-bit timestamp (modulo 2^32) according
to the local clock. to the local clock.
Both values are initially undefined. Both values are initially undefined.
3.2. Protocol Operation 3.2. Protocol Operation
The RTT to a neighbour is estimated using an algorithm by Mills The RTT to a neighbour is estimated using an algorithm due to Mills
[RFC891], originally developed for the HELLO routing protocol and [RFC891], originally developed for the HELLO routing protocol and
later used in NTP [RFC5905]. later used in NTP [RFC5905].
A Babel speaker periodically sends Hello messages to its neighbours A Babel speaker periodically sends Hello messages to its neighbours
(Section 3.4.1 of [RFC8966]). Additionally, it occasionally sends a (Section 3.4.1 of [RFC8966]). Additionally, it occasionally sends a
set of IHU ("I Heard You") messages, at most one per neighbour set of IHU ("I Heard You") messages, at most one per neighbour
(Section 3.4.2 of [RFC8966]). (Section 3.4.2 of [RFC8966]).
A B A B
| | | |
skipping to change at line 219 skipping to change at line 218
| / | | / |
| / | Hello(t2') | / | Hello(t2')
| / | IHU(t1, t1') | / | IHU(t1, t1')
|/ | |/ |
t2 + | t2 + |
| | | |
v v v v
Figure 2: Mills' Algorithm Figure 2: Mills' Algorithm
In order to enable the computation of RTTs, node A MUST include, in In order to enable the computation of RTTs, a node A MUST include, in
every Hello that it sends, a timestamp t1 (according to A's local every Hello that it sends, a timestamp t1 (according to A's local
clock), as illustrated in Figure 2. When node B receives A's clock), as illustrated in Figure 2. When a node B receives A's
timestamped Hello, it computes the time t1' at which the Hello was timestamped Hello, it computes the time t1' at which the Hello was
received (according to B's local clock). It then MUST record the received (according to B's local clock). It then MUST record the
value t1 in the Origin Timestamp field of the Neighbour Table entry value t1 in the Origin Timestamp field of the Neighbour Table entry
corresponding to A and the value t1' in the Receive Timestamp field corresponding to A and the value t1' in the Receive Timestamp field
of the Neighbour Table entry. of the Neighbour Table entry.
When B sends an IHU to A, it checks whether both timestamps are When B sends an IHU to A, it checks whether both timestamps are
defined in the Neighbour Table. If that is the case, then it MUST defined in the Neighbour Table. If that is the case, then it MUST
ensure that its IHU TLV is sent in a packet that also contains a ensure that its IHU TLV is sent in a packet that also contains a
timestamped Hello TLV (either a normally scheduled Hello or an timestamped Hello TLV (either a normally scheduled Hello or an
unscheduled Hello, see Section 3.4.1 of [RFC8966]). It MUST include unscheduled Hello, see Section 3.4.1 of [RFC8966]). It MUST include
in the IHU both the Origin Timestamp and the Receive Timestamp stored in the IHU both the Origin Timestamp and the Receive Timestamp stored
in the Neighbour Table. in the Neighbour Table.
Upon receiving B's packet, A computes the time t2 (according to its Upon receiving B's packet, A computes the time t2 (according to its
local clock) at which it was received. Then, A MUST verify that it local clock) at which it was received. Node A MUST then verify that
contains both a Hello TLV with timestamp t2' and an IHU TLV with two it contains both a Hello TLV with timestamp t2' and an IHU TLV with
timestamps: t1 and t1'. If that is the case, A computes the value: two timestamps t1 and t1'. If that is the case, A computes the
value:
RTT = (t2 - t1) - (t2' - t1') RTT = (t2 - t1) - (t2' - t1')
(where all computations are done modulo 2^32), which is a measurement (where all computations are done modulo 2^32), which is a measurement
of the RTT between A and B. (A then stores the values t2' and t2 in of the RTT between A and B. (A then stores the values t2' and t2 in
its Neighbour Table, as B did before.) its Neighbour Table, as B did before.)
This algorithm has a number of desirable properties: This algorithm has a number of desirable properties:
1. The algorithm is symmetric. A and B use the same procedures for 1. The algorithm is symmetric. A and B use the same procedures for
timestamping packets and computing RTT samples; both nodes timestamping packets and computing RTT samples: both nodes
produce one RTT sample for each received (Hello, IHU) pair. produce one RTT sample for each received (Hello, IHU) pair.
2. Since there is no requirement that t1' and t2' be equal, the 2. Since there is no requirement that t1' and t2' be equal, the
protocol is asynchronous: the only change to Babel's message protocol is asynchronous: the only change to Babel's message
scheduling is the requirement that a packet containing an IHU scheduling is the requirement that a packet containing an IHU
also contain a Hello. also contain a Hello.
3. Since the algorithm only ever computes differences of timestamps 3. Since the algorithm only ever computes differences of timestamps
according to a single clock, it does not require synchronised according to a single clock, it does not require synchronised
clocks. clocks.
skipping to change at line 337 skipping to change at line 337
The protocol is designed to survive the clock being reset when a node The protocol is designed to survive the clock being reset when a node
reboots; on POSIX systems, this makes it possible to use the reboots; on POSIX systems, this makes it possible to use the
CLOCK_MONOTONIC clock for computing timestamps. If CLOCK_MONOTONIC CLOCK_MONOTONIC clock for computing timestamps. If CLOCK_MONOTONIC
is not available, CLOCK_REALTIME may be used, since the protocol is is not available, CLOCK_REALTIME may be used, since the protocol is
able to survive the clock being occasionally stepped. able to survive the clock being occasionally stepped.
4. RTT-Based Route Selection 4. RTT-Based Route Selection
The protocol described above yields a series of RTT samples. While The protocol described above yields a series of RTT samples. While
these samples are fairly accurate, they are not directly usable as an these samples are fairly accurate, they are not directly usable as an
input to the route-selection procedure, for at least three reasons: input to the route selection procedure, for at least three reasons:
1. In the presence of bursty traffic, routers experience transient 1. In the presence of bursty traffic, routers experience transient
congestion, which causes occasional spikes in the measured RTT. congestion, which causes occasional spikes in the measured RTT.
Thus, the RTT signal may be noisy and require smoothing before it Thus, the RTT signal may be noisy and require smoothing before it
can be used for route selection. can be used for route selection.
2. Using the RTT signal for route selection gives rise to a negative 2. Using the RTT signal for route selection gives rise to a negative
feedback loop. When a route has a low RTT, it is deemed to be feedback loop: when a route has a low RTT, it is deemed to be
more desirable: this causes it to be used for more data traffic, more desirable: this causes it to be used for more data traffic,
which may lead to congestion, which in turn increases the RTT. which may lead to congestion, which in turn increases the RTT.
Without some form of hysteresis, using RTT for route selection Without some form of hysteresis, using RTT for route selection
would lead to oscillations between parallel routes, which would would lead to oscillations between parallel routes, which would
lead to packet reordering and negatively affect upper-layer lead to packet reordering and negatively affect upper-layer
protocols (such as TCP). protocols (such as TCP).
3. Even in the absence of congestion, the RTT tends to exhibit some 3. Even in the absence of congestion, the RTT tends to exhibit some
variation. If the RTTs of two parallel routes oscillate around a variation. If the RTTs of two parallel routes oscillate around a
common value, using the RTT as input to route selection will common value, using the RTT as input to route selection will
cause frequent routing oscillations, which, again, indicates the cause frequent routing oscillations, which, again, indicates the
need for some form of hysteresis. need for some form of hysteresis.
In this section, we describe an algorithm that integrates smoothing In this section, we describe an algorithm that integrates smoothing
and hysteresis. It has also been shown to behave well both in and hysteresis. It has been shown to behave well both in simulation
simulation and experimentally over the Internet [DELAY-BASED] and is and experimentally over the Internet [DELAY-BASED] and is RECOMMENDED
RECOMMENDED when RTT information is being used for route selection. when RTT information is being used for route selection. The
The algorithm is structured as follows: algorithm is structured as follows:
* the RTT values are first smoothed in order to avoid instabilities * the RTT values are first smoothed in order to avoid instabilities
due to outliers (Section 4.1); due to outliers (Section 4.1);
* the resulting smoothed samples are mapped to a cost using a * the resulting smoothed samples are mapped to a cost using a
bounded, non-linear mapping, which avoids instabilities at the bounded, non-linear mapping, which avoids instabilities at the
lower and upper end of the RTT range (Section 4.2); lower and upper end of the RTT range (Section 4.2);
* a hysteresis filter is applied in order to limit the amount of * a hysteresis filter is applied in order to limit the amount of
oscillation in the middle of the RTT range (Section 4.3). oscillation in the middle of the RTT range (Section 4.3).
skipping to change at line 406 skipping to change at line 406
0.836 is the RECOMMENDED default. 0.836 is the RECOMMENDED default.
4.2. Cost Computation 4.2. Cost Computation
The smoothed RTT value obtained in the previous step needs to be The smoothed RTT value obtained in the previous step needs to be
mapped to a link cost, suitable for input to the metric computation mapped to a link cost, suitable for input to the metric computation
procedure (Section 3.5.2 of [RFC8966]). Obviously, the mapping procedure (Section 3.5.2 of [RFC8966]). Obviously, the mapping
should be monotonic (larger RTTs imply larger costs). In addition, should be monotonic (larger RTTs imply larger costs). In addition,
the mapping should be constant beyond a certain value (all very bad the mapping should be constant beyond a certain value (all very bad
links are equally bad) so that congested links do not contribute to links are equally bad) so that congested links do not contribute to
routing instability. The mapping should also be constant around 0; routing instability. The mapping should also be constant around 0,
this is so small oscillations in the RTT of low-RTT links do not so that small oscillations in the RTT of low-RTT links do not
contribute to routing instability. contribute to routing instability.
cost cost
^ ^
| |
| |
| C + max-rtt-penalty | C + max-rtt-penalty
| +--------------------------- | +---------------------------
| /. | /.
| / . | / .
skipping to change at line 474 skipping to change at line 474
[RFC8966], implementations that do not understand this extension will [RFC8966], implementations that do not understand this extension will
silently ignore the sub-TLVs while parsing the rest of the TLVs that silently ignore the sub-TLVs while parsing the rest of the TLVs that
they contain. In effect, this extension supports building hybrid they contain. In effect, this extension supports building hybrid
networks consisting of extended and unextended routers; while such networks consisting of extended and unextended routers; while such
networks might suffer from sub-optimal routing, they will not suffer networks might suffer from sub-optimal routing, they will not suffer
from blackholes or routing loops. from blackholes or routing loops.
If a sub-TLV defined in this extension is longer than expected, the If a sub-TLV defined in this extension is longer than expected, the
additional data is silently ignored. This provision is made in order additional data is silently ignored. This provision is made in order
to allow a future version of this protocol to extend the packet to allow a future version of this protocol to extend the packet
format with additional data, for example, high-precision or absolute format with additional data, for example high-precision or absolute
timestamps. timestamps.
6. Packet Format 6. Packet Format
This extension defines the Timestamp sub-TLV whose Type field has a This extension defines the Timestamp sub-TLV whose Type field has the
value of 3. This sub-TLV can be contained within a Hello sub-TLV, in value 3. This sub-TLV can be contained within a Hello sub-TLV, in
which case it carries a single timestamp, or within an IHU sub-TLV, which case it carries a single timestamp, or within an IHU sub-TLV,
in which case it carries two timestamps. in which case it carries two timestamps.
Timestamps are encoded as 32-bit unsigned integers (modulo 2^32), Timestamps are encoded as 32-bit unsigned integers (modulo 2^32),
expressed in units of one microsecond, counting from an arbitrary expressed in units of one microsecond, counting from an arbitrary
origin. Timestamps wrap around every 4295 seconds, or roughly 71 origin. Timestamps wrap around every 4295 seconds, or roughly 71
minutes (see also Section 3.3). minutes (see also Section 3.3).
6.1. Timestamp Sub-TLV in Hello TLVs 6.1. Timestamp Sub-TLV in Hello TLVs
 End of changes. 18 change blocks. 
39 lines changed or deleted 39 lines changed or added

This html diff was produced by rfcdiff 1.48.