ARM How to invoke branching?

Was looking through a code with regards to loop.

loopinner ....
          SUBS R2,R2,#1 ; j--
          BGT loopinner ;in this case, loop should continue when j>1

In this case, I am not sure how BGT branches to the loopinner again. Don't I need to specify what it is greater than? Since SUBS invoke the flags, let's say if j-- becomes the value of 1. How does the branch knows what value it is greater than?

Answers 3

  • From ARM conditionals you can readily find that the instruction examines the Z, N, and V status flags and branches when Z=0 & N=V. Since it examines the V status flag and not the C status flag, this is clearly intended as a signed test. (This means to me that this isn't useful for unsigned loop control -- FYI.)

    I wrote this not so long ago, with enough information to understand what's going on. But I can summarize it here.

    Let's use simpler 4-bit words where there are only 16 symbols:

    Word     Signed     Subtrahend
    0000         0         1111
    0001         1         1110
    0010         2         1101
    0011         3         1100
    0100         4         1011
    0101         5         1010
    0110         6         1001
    0111         7         1000
    1000        -8         0111
    1001        -7         0110
    1010        -6         0101
    1011        -5         0100
    1100        -4         0011
    1101        -3         0010
    1110        -2         0001
    1111        -1         0000

    Above, the third column is what the ALU actually uses when subtracting by that value. It simply inverts each bit before adding. (The ALU never subtracts anything. It doesn't even know how.) So, the SUB instruction actually performs addition, using the subtrahend form of the value when adding. (If you want to understand status bit semantics, it's pretty important that you master this concept as it will help you when you'd otherwise be confused.)

    Stamp it onto your forehead --


    If you ever feel the temptation to go down the primrose path of believing that any kind of subtract instruction actually subtracts, and this includes all comparison instructions that set status bits but don't change register values, just kick yourself really hard, really fast. It doesn't happen.


    Everything has to be cast into addition semantics. Everything.

    A SUBS R2, R2, #1, in this 4-bit universe I just created, would add 1110 plus a carry-in of 1, as well. There are only 16 possibilities:

    Actual Operation    Operation Result    Operation      Comparison
     R2     SUBS OP        Z N V C ALU       Semantics      Semantics   Z=0 & N=V?
    0000 + 1110 + 1        0 1 0 0 1111     0 - 1 = -1       0 > 1 ?    False
    0001 + 1110 + 1        1 0 0 1 0000     1 - 1 =  0       1 > 1 ?    False
    0010 + 1110 + 1        0 0 0 1 0001     2 - 1 =  1       2 > 1 ?    True
    0011 + 1110 + 1        0 0 0 1 0010     3 - 1 =  2       3 > 1 ?    True
    0100 + 1110 + 1        0 0 0 1 0011     4 - 1 =  3       4 > 1 ?    True
    0101 + 1110 + 1        0 0 0 1 0100     5 - 1 =  4       5 > 1 ?    True
    0110 + 1110 + 1        0 0 0 1 0101     6 - 1 =  5       6 > 1 ?    True
    0111 + 1110 + 1        0 0 0 1 0110     7 - 1 =  6       7 > 1 ?    True
    1000 + 1110 + 1        0 0 1 0 0111    -8 - 1 = -9 E    -8 > 1 ?    False
    1001 + 1110 + 1        0 1 0 1 1000    -7 - 1 = -8      -7 > 1 ?    False
    1010 + 1110 + 1        0 1 0 1 1001    -6 - 1 = -7      -6 > 1 ?    False
    1011 + 1110 + 1        0 1 0 1 1010    -5 - 1 = -6      -5 > 1 ?    False
    1100 + 1110 + 1        0 1 0 1 1011    -4 - 1 = -5      -4 > 1 ?    False
    1101 + 1110 + 1        0 1 0 1 1100    -3 - 1 = -4      -3 > 1 ?    False
    1110 + 1110 + 1        0 1 0 1 1101    -2 - 1 = -3      -2 > 1 ?    False
    1111 + 1110 + 1        0 1 0 1 1110    -1 - 1 = -2      -1 > 1 ?    False

    Under Operation Result I have a column for ALU. The ALU field is what goes back into R2 after the SUBS instruction completes. (The V status flag is generated by an XOR of the carry-out of the next-to-most significant bit during the operation and the carry bit itself.) Note also that there is a single case marked with E where a signed overflow occurred.

    You can now easily see why the BGT instruction applies those particular status bits in exactly the way it does. Admittedly, this uses 4-bit words. But the exact same idea applies to much wider word sizes, without any change to it.

    Looking back at the table, you can see that the condition is True if and only if R2 was 2 or greater before the subtraction, and not 1 or 0 or smaller.

    Your question:

    Don't I need to specify what it is greater than? Since SUBS invoke the flags, let's say if j-- becomes the value of 1. How does the branch knows what value it is greater than?

    Let's start with the following table from the ARMv6-M Architecture Reference Manual, page A6-99:

    enter image description here

    The GT condition is described as "Signed greater than". The reason the documentation doesn't specify a constant is that this test occurs after some prior instruction. That prior instruction defines the context. But without having that context, all that can be said is a general signed >.

    So, if the prior instruction were CMP:

    enter image description here

    Then the context would be the comparison of two signed values and the BGT instruction would then mean "branch when signed operand 1 is greater than signed operand 2."

    But in your case, with "SUBS R2, R2, #1" the context changes and the BGT instruction would then mean "branch while signed R2 still remains greater than 0."

    The conditional branch instruction itself doesn't actually know what the prior instruction was. It also doesn't even know what register(s) are involved. That knowledge is left to the individual (or compiler) that is generating the instruction stream. So the branch instruction doesn't actually have a fixed constant value, nor does it have a register with which to compare against. It depends entirely upon what earlier instructions did with the status bits. It just examines the resulting status and then does what it does. It's up to you to know the context and to use it, correctly.

    (Speaking of which, the source code comment may be misleading or wrong.)


    Elliot takes issue (see discussion below) without evidence. He writes, "I could equivalently argue that a CPU can only subtract." He can make that argument, but it is only academic. The actual fact of the matter is that CPUs don't subtract. They add.

    So while this is partly my response, providing clear, unequivocal evidence in support so that even Elliot can understand the situation on the ground, today, it's also an excellent segue, too. So I'm very glad for the opportunity Elliot affords me in expanding the discussion.

    My first CPU was made from 7400 parts that I built and successfully completed in 1974. Newspaper reporters, to my surprise, showed up and wrote an article about it. That's my first experience. Since then, I professionally worked at Intel doing chipset testing for the BX chipset and, as a matter of relevance to teaching this subject, I've taught Computer Architecture classes as an adjunct professor at Portland State University in the 1990's, with class sizes of approximately 65-75 students. This is the largest 4-year university in the State of Oregon.

    I feel equivocation (expressing ambivalence about how computations might be done) about how processors generate their status bits and how they compute only leads students into unnecessary uncertainty, confusion and difficulty that can take hours, weeks, months and sometimes even years to correct. Just as teaching group-theoretic abstract algebra before getting the basics across would confuse most first-year algebra students, so also would teaching academic abstractions about how computers could do things. More students would be damaged, than helped.

    The simple truth is that instruction decoding emits an ADD, even when the instruction text (it's just text, after all -- it's not what is actually going on) says SUB. The decoding still issues an ADD. It just modifies some operand details along the way.

    Similarly, as it must also be in the case of the ARM processor, the above theory is all you need to understand how things are actually done.

    Please don't confuse yourself! Computers add. They don't subtract. They just fiddle around a bit to make it look like they subtract.

    For good or bad, it's important to understand what a computer actually does in order to understand certain status bits; what they do and why they do it. There's no other way around it. The above theoretical model is the way things work in modern processors and it is how to work out and understand the status bits, correctly. There is a good reason why things are the way they are.

    It's my hope that these details, above, and those I'll write below will be useful. Any failure to communicate here is mine and I'll gladly work to repair, amend, and improve this document where I may.

    To continue, I'll be using the ARMv6-M Architecture Reference Manual as a reference.

    Let's start on page A6-187 (register case):

    enter image description here

    Here, you can see that they clearly document this behavior:

    AddWithCarry(R[n], NOT(shifted), '1')

    This is an addition, with operand 2 (the subtrahend) inverted and the carry-in set to '1'. Just as I wrote happens, above. (It's just how it is done.)

    In the case of multi-word extensions, go to page A6-173, and find SBCS:

    enter image description here

    Here note that they again use addition:

    AddWithCarry(R[n], NOT(shifted), APSR.C)

    Instead of the carry-in being a hard-coded '1', as it is for the SUBS instruction, it's now using the last-saved carry-out value. In this case, it's usually expected that this will be the carry-out from a prior SUBS (or SBCS) instruction.

    For multi-word operations, one starts with SUBS (or ADDS) and then continues the process with subsequent SBCS (or ADCS), which use the carry-out of earlier instructions to support a multi-word operation.

    In multi-word addition, this carry-out can be thought of just as a carry-out, which it is. A '1' indicates that a carry occurred and needs to be dealt with. A '0' indicates no carry occurred.

    In the case of multi-word subtraction, this carry-out is better seen as an inverted borrow-from. A '1' indicates that there was no need to borrow from a higher-order word. A '0' indicates that there is a need to borrow. Since a SUBS instruction always sets this to '1', this means there's no borrow (the subtraction result requires an 'increment' in order to compensate for the inverted operand 2.) But for the SBCS instruction, if APSR.C is a '0', then no 'increment' takes place and this is the same as borrowing (since an increment is required, if there is no borrow.)

    The ADCS instruction, found on page A6-106 but not displayed here, also uses the carry-out of prior instruction executions. It doesn't invert the carry-out value or otherwise do something weird or different, just because it is an ADCS instruction. It does exactly the same thing as the SBCS instruction except and only for one minor detail -- the SBCS instruction will invert operand 2 and ADCS won't. That's it.

    This is one of the really cool aspects about the way these details work. Very little added logic is required to turn an addition into a subtraction and/or a multi-word addition into a multi-word subtraction.

    And finally, to complete the story, see page A2-35:

    enter image description here

    Consistent with my descriptions of how things actually do work, above.

    It's really a pleasure to see how all this works. It's worth some time playing with different signed and unsigned values and, by hand, setting and using status flags. It really deepens these ideas. And they are very good ones!

    All of the above is about understanding the status bits and how they are generated and why they are generated in the way that they are. If you focus on what actually happens in a CPU, the rest just falls out as the necessary consequences and it's very easy to understand, then.

    A CPU only adds. It cannot subtract.

  • In your code example the SUB instruction has an S suffix, this means that the sub instruction will set the condition flags, which the BGT will evaluate. For the branch to be taken, the Z flag must be 0, and the N flag must equal V

  • The arm documentation clearly states that GT is a signed greater than, it will branch when Z==0,N==V.

    When r2 = 2. Remember from grade school that x - y = x + (-y), and from day one (or shortly thereafter) in computer engineering/science/whatever twos complement negation is invert and add one so x - y = x + (~y) + 1. This saves on logic and is how we do subtraction

          1  add one
     + 1110  invert

    four bits is more than enough to see what is going on, the result is the same as 32 bits.

      + 1110

    So N = 0 and Z = 0 from the result. The carry in and carry out of the msbit are the same so V = 0 (xor of the carry in and carry out of the msbit, can also do it by inspection of the msbits of the operands and result).

    We need Z == 0 and N == V to do the branch, and they are, so the branch happens.

    You will find this is the case for positive numbers since this is a signed greater than, if you wanted unsigned greater than then use bcs/bhs, logic works the same it just optimizes to using the carry out only (can see this as well if you look at the table jonk generated or generate one yourself)

    When r2 = 1

     +  1110

    Z = 1, N = 0, V = 0

    N == V but Z != 0 so the branch does not happen.

Related Questions