Persistent packet loss statistics in rtcp

brief introduction

In the rr (receiver report) packet of rtcp, there are two fields, fraction lost and cumulative number of packets lost, which are used to indicate the packet loss rate and the total number of packets lost. This article will introduce the definition of these two values and the implementation in the chrome kernel.

cause

The reason is that I saw a way to count the number of packet losses. The approximate statistical method is as follows: take the seq number + 1 corresponding to new time (the time of the latest received packet) minus the seq number of new time - windows time (windows tiem is the statistical window time), and calculate the expected number of packets. The number of packets received during the period from new time to new time - window time is counted as received count. Finally, calculate the number of lost packets loss count from expect number - receive count. It looks like this:

Here is sequence number Corresponding to the change of reception time. In order to facilitate statistics, there is no unit of time.

-> sequence number
1 2 3 4 5 [6 7 9 10 11] seq number
1 2 3 4 5 [6 7 8 9  10] time
-> time

Assume that window time is 5
seq number of new time: 11
New time - seq number of windows time: 6
expected number: 11 + 1 - 6 = 6, [6, 7, 8, 9, 10, 11]
receive count: 5
losst count: 6 - 5 = 1
There seems to be no problem here, but there will be problems in case of disorder. Disorder is still very common in udp packages. If the order is out of order, there will be some problems, probably like this:

1 2 3 4 5 [6 7 9 11 10] seq number
1 2 3 4 5 [6 7 8 9  10] time

Assume that window time is 5
seq number of new time: 11
New time - seq number of windows time: 6
expected number: 10 + 1 - 6 = 5, [6, 7, 8, 9, 10]
receive count: 5
loss count: 5 - 5 = 0
At this time, the accountant calculated a wrong value.
In fact, if you think carefully, this is not just a problem when the sequence is out of order. When you repeatedly receive packets with the same or wrong sequence number (the sequence number value jumps very large). At this time, I think about how to realize the standard webrtc?

Implementation in licode

First, I went to see the implementation of licode. Why do I look at the implementation of licode first? Forget it. There's a reason anyway, ha ha. The statistics of packet loss in licode are as follows:

//Processing after receiving an rtp packet
bool RtcpRrGenerator::handleRtpPacket(std::shared_ptr<DataPacket> packet) {
  /* ......
    A lot of irrelevant code is omitted
    ......*/
  uint16_t seq_num = head->getSeqNumber();
  rr_info_.packets_received++; //Number of received packets + +, remember
  if (rr_info_.base_seq == -1) {
    rr_info_.base_seq = head->getSeqNumber(); //When you receive the rtp package for the first time, remember
  }
  if (rr_info_.max_seq == -1) { //rtp packet received for the first time
    rr_info_.max_seq = seq_num; 
  } else if (!RtpUtils::sequenceNumberLessThan(seq_num, rr_info_.max_seq)) { 
    //If loopback processing has been done in the judgment condition, the final SEQ will be used_ Num and max_seq comparison, enter here is seq_num > max_seq
    if (seq_num < rr_info_.max_seq) { //Loop judgment
      rr_info_.cycle++; //If a loopback occurs, the number of loops++
    }
    rr_info_.max_seq = seq_num; //So RR_ info_. max_ The largest seq number is always recorded in SEQ
  }
  //Because of the existence of loop, calculate the final maximum seq number, remember
  rr_info_.extended_seq = (rr_info_.cycle << 16) | rr_info_.max_seq;
  
  return false;
}

//Generate rr package
std::shared_ptr<DataPacket> RtcpRrGenerator::generateReceiverReport() {
  /* ......
    A lot of irrelevant code is omitted
    ......*/    
  uint64_t now = ClockUtils::timePointToMs(clock_->now());//current time 
  //Expected packets received = maximum seq num - initial seq num
  uint32_t expected = rr_info_.extended_seq - rr_info_.base_seq + 1;  
  
  //Expected packets in this statistics = total expected packets received in this Statistics - e
  uint32_t expected_interval = expected - rr_info_.expected_prior;
  //Total expected statistics received = total expected statistics received this time
  rr_info_.expected_prior = expected; 
  //Packets received this time = the total number of packets received this time - the total number of packets received last time
  uint32_t received_interval = rr_info_.packets_received - rr_info_.received_prior; 
  //Update the total number of packets received in the last statistics = the total number of packets received in this statistics
  rr_info_.received_prior = rr_info_.packets_received;
  //Count the number of packets lost this time = count the expected packets this time - count the received packets this time
  int64_t lost_interval = static_cast<int64_t>(expected_interval) - received_interval;

  // TODO(pedro): We're getting closer to packet loss without retransmissions by ignoring negative
  // lost in the interval. This is not perfect but will provide a more "monotonically increasing" behavior
  if (lost_interval > 0) {//I made a non negative judgment. The above view is the author's own comment
    //Lost plus the number of statistical packets lost each time, that is, the number of all packets lost
    rr_info_.lost += lost_interval;
  }
  //Assigned to rtcp, the total number of packet losses
  rtcp_head.setLostPackets(rr_info_.lost);
  
  return (std::make_shared<DataPacket>(0, reinterpret_cast<char*>(&packet_), length, type_));
}

After carefully reading the previous code and comments, it can be seen that for each statistics in licode, the seq number of the latest received package will not be used. Instead, the expected number will be obtained by subtracting the first received seq number from the largest received seq number. received number is all received packets. Therefore, there is a solution to the problem of out of order packets in licode. But I have another question. It does not solve the problem of "repeatedly receiving packets with the same or wrong serial number". Let's take a look at how it is done in chrome.

Implementation in chrome

//Processing after receiving rtp packet
void StreamStatisticianImpl::UpdateCounters(const RtpPacketReceived& packet) {
  /* ......
    A lot of irrelevant code is omitted
    ......*/    
  int64_t now_ms = clock_->TimeInMilliseconds();
  //Number of continuous packet loss--
  --cumulative_loss_;

  //Because of the loopback problem, calculate the final sequence_number
  int64_t sequence_number =
      seq_unwrapper_.UnwrapWithoutUpdate(packet.SequenceNumber());

  if (!ReceivedRtpPacket()) { //If the rtp packet is received for the first time
    //Record the value of the first rtp packet accepted, remember
    received_seq_first_ = sequence_number;

    last_report_seq_max_ = sequence_number - 1; 
    //Maximum packet received seq number
    received_seq_max_ = sequence_number - 1; 
    receive_counters_.first_packet_time_ms = now_ms;
  } else if (UpdateOutOfOrder(packet, sequence_number, now_ms)) {
    //It is very important to check whether the currently received packet is out of order. The specific implementation is later. If it is out of order, return directly
    return;
  }
  //Sequential packet processing logic


  //Count the number of packet losses
  //Hypothesis: sequence_number = 3, received_seq_max_ = 2, cumulative_ loss_ +=  one
  //Because at the beginning, it was cumulative_loss_--, So offset. A value is added only when packet loss really occurs
  cumulative_loss_ += sequence_number - received_seq_max_;
  //received_seq_max_ Assign the maximum value of seq number
  received_seq_max_ = sequence_number;
}

//Detect whether the currently received packet is out of order
bool StreamStatisticianImpl::UpdateOutOfOrder(const RtpPacketReceived& packet,
                                              int64_t sequence_number,
                                              int64_t now_ms) {
    /* ......
    A little irrelevant code is omitted
    ......*/  
  // received_seq_out_of_order_  This variable does two things:
  //1. If there is a value, mark that the last package is a package larger than the seq number of the last package. (I'll call the packet "seq jump packet",)
  //2. There is a value. The value represents the seq number of the previous package
  //ps: when will there be seq jump package? I observed from the comments that it is when stream restart.
  if (received_seq_out_of_order_) {
    //Number of continuous packet loss --. As you can see from the following logic, if the seq jump packet appears, it will not be done -- cumulative for the time being_ loss_, It will do -- cumulative after the seq jump packet_ loss_
    --cumulative_loss_;
    //Estimate the seq number of this package
    uint16_t expected_sequence_number = *received_seq_out_of_order_ + 1;
    //Clear this data
    received_seq_out_of_order_ = absl::nullopt;
    //If this packet is the next packet of the seq jump packet, it is determined that the flow has changed, and the whole seq number needs to be re evaluated
    if (packet.SequenceNumber() == expected_sequence_number) {
      // This English comment is the comment of the source code
      // Ignore sequence number gap caused by stream restart for packet loss
      // calculation, by setting received_seq_max_ to the sequence number just
      // before the out-of-order seqno. This gives a net zero change of
      // `cumulative_loss_`, for the two packets interpreted as a stream reset.
      //
      // Fraction loss for the next report may get a bit off, since we don't
      // update last_report_seq_max_ and last_report_cumulative_loss_ in a
      // consistent way.

      //Set sequence_number is set to skip the previous package
      //This is done to make cumulative_loss_ Twice -- add back the operation. If you don't understand, you can see it in combination with the explanation
      received_seq_max_ = sequence_number - 2;
      //Return false, which means that the logic of the sequential package will continue
      return false;
    }
  }

  //Check whether the difference between the seq num of this package and the previous package is too large. This is to judge whether this packet is a jump packet.
  if (std::abs(sequence_number - received_seq_max_) >
      max_reordering_threshold_) {
    // Sequence number gap looks too large, wait until next packet to check
    // for a stream restart.
    // If the difference is too large, mark and assign a value.
    received_seq_out_of_order_ = packet.SequenceNumber();
    // Postpone counting this as a received packet until we know how to update
    // `received_seq_max_`, otherwise we temporarily decrement
    // `cumulative_loss_`. The
    // ReceiveStatisticsTest.StreamRestartDoesntCountAsLoss test expects
    // `cumulative_loss_` to be unchanged by the reception of the first packet
    // after stream reset.
    //Restore the statistics of the current package because it was done at the beginning. The comment says it is for delay statistics. I don't know why delay statistics are needed
    ++cumulative_loss_;
    //Return true, and the logic of the following sequential packages will not be executed
    return true;
  }
  //It is a normally incremented package and returns false directly
  if (sequence_number > received_seq_max_)
    return false;
  //If you get here, it's a disorderly package
  return true;
}


The source code in chrome is a little more difficult to understand than licode.
The logic of packet loss calculation in chrome is:

  1. I received a bag and put it cumulative first_ loss_ Minus 1
  2. Judge whether the package is ordered. If it is unordered (SEQ < Max SEQ), return. If it is orderly, continue
  3. Make up the number of lost packets. If it is incremented by 1, it is offset by the just minus 1
    Therefore, chrome handles out of order packets.
    When processing logic for seq jump packets (if cumulative_loss is added, it will become very chaotic, so the modification of cumulative_loss is not added):
  4. The current packet is a seq hop packet. Record the seq number of the seq hop packet (we define it as unorder seq).
  5. Accept the next package. Judge whether the current package seq number is equal to unorder SEQ + 1. If yes, it means that the current seq number really jumps, and the received needs to be modified_ seq_ max_ The value of is used for the subsequent correct statistics of the number of packet losses.
  6. If seq number and unorder SEQ + 1 are not equal, it means that the SEQ jump package is a problematic package and the current received will not be modified_ seq_ max_ Value, continue to make statistics with the current seq number.
    It's a little windy. If you don't understand it, you can read my code comments or source code twice. [receive_statistics_impl.cc source code]( https://webrtc.googlesource.com/src/+/refs/heads/main/modules/rtp_rtcp/source/receive_statistics_impl.cc)
    In fact, the implementation in chrome is very rigorous, but it still does not deal with the phenomenon of duplicate packets in rr and the statistics of wrong seq jump packets. Is there a problem with the code written by chrome? So I went to the description of cumulative number of packets lost in rfc.

Definition in rfc

RFC is described as follows, complete RFC document [rfc3550]( https://datatracker.ietf.org/doc/html/rfc3550#section -6.4.1)

cumulative number of packets lost: 24 bits
      The total number of RTP data packets from source SSRC_n that have
      been lost since the beginning of reception.  This number is
      defined to be the number of packets expected less the number of
      packets actually received, where the number of packets received
      includes any which are late or duplicates.  Thus, packets that
      arrive late are not counted as lost, and the loss may be negative
      if there are duplicates.  The number of packets expected is
      defined to be the extended last sequence number received, as
      defined next, less the initial sequence number received.  This may
      be calculated as shown in Appendix A.3.

This number is defined to be the number of packets expected less the number of packets actually received, where the number of packets received includes any which are late or duplicates
This value is defined as the number of packets expected to be received minus the number of packets actually received. The number of packets actually received includes late or duplicate packets.
Therefore, the cumulative number of packets lost is not the number of packets lost, it is just a difference. It can also be negative.
Well, the truth is finally revealed.

Keywords: webrtc rtc

Added by nitram on Tue, 04 Jan 2022 07:43:27 +0200