# ARM How to invoke branching?

Was looking through a code with regards to loop.

```
loopinner ....
SUBS R2,R2,#1 ; j--
BGT loopinner ;in this case, loop should continue when j>1
```

In this case, I am not sure how BGT branches to the loopinner again. Don't I need to specify what it is greater than? Since SUBS invoke the flags, let's say if j-- becomes the value of 1. How does the branch knows what value it is greater than?

## Answers 3

From ARM conditionals you can readily find that the instruction examines the Z, N, and V status flags and branches when Z=0 & N=V. Since it examines the V status flag and not the C status flag, this is clearly intended as a

test. (signedThis means to me that this isn't useful for unsigned loop control -- FYI.)I wrote this not so long ago, with enough information to understand what's going on. But I can summarize it here.

Let's use simpler 4-bit words where there are only 16 symbols:

Above, the third column is what the ALU

uses when subtracting by that value. It simply inverts each bit before adding. (The ALUactuallysubtracts anything. It doesn't even know how.) So, the SUB instruction actually performs addition, using the subtrahend form of the value when adding. (If you want to understand status bit semantics, it's pretty important that you master this concept as it will help you when you'd otherwise be confused.)neverStamp it onto your forehead --

A CPU ONLY ADDS. IT CANNOT SUBTRACT.If you ever feel the temptation to go down the primrose path of believing that any kind of subtract instruction

actuallysubtracts, and this includes all comparison instructions that set status bits but don't change register values, just kick yourself really hard, really fast. It doesn't happen.A CPU ONLY ADDS. IT CANNOT SUBTRACT.Everything has to be cast into addition semantics. Everything.

A

SUBS R2, R2, #1, in this 4-bit universe I just created, would add 1110 plus a carry-in of 1, as well. There are only 16 possibilities:Under

Operation ResultI have a column forALU. TheALUfield is what goes back into R2 after the SUBS instruction completes. (TheVstatus flag is generated by an XOR of the carry-out of the next-to-most significant bit during the operation and the carry bit itself.) Note also that there is a single case marked withEwhere a signed overflow occurred.You can now easily see why the BGT instruction applies those particular status bits in exactly the way it does. Admittedly, this uses 4-bit words. But the exact same idea applies to much wider word sizes, without any change to it.

Looking back at the table, you can see that the condition is

Trueif and only if R2 was2 or greaterbefore the subtraction, and not 1 or 0 or smaller.Your question:

Let's start with the following table from the ARMv6-M Architecture Reference Manual, page A6-99:

The

GTcondition is described as "Signed greater than". The reason the documentation doesn't specify a constant is that this test occurssome prior instruction. That prior instruction defines the context. But without having that context, all that can be said is a generalaftersigned >.So, if the prior instruction were CMP:

Then the context would be the comparison of two signed values and the BGT instruction would then mean "branch when signed operand 1 is greater than signed operand 2."

But in your case, with "SUBS R2, R2, #1" the context changes and the BGT instruction would then mean "branch while signed R2 still remains greater than 0."

The conditional branch instruction itself doesn't actually

knowwhat the prior instruction was. It also doesn't even know what register(s) are involved. That knowledge is left to the individual (or compiler) that is generating the instruction stream. So the branch instruction doesn't actually have a fixed constant value, nor does it have a register with which to compare against. It depends entirely upon what earlier instructions did with the status bits. It just examines the resulting status and then does what it does. It's up to you to know the context and to use it, correctly.(Speaking of which, the source code comment may be misleading or wrong.)

## Note

Elliot takes issue (see discussion below) without evidence. He writes,

"I could equivalently argue that a CPU can only subtract."He can make that argument, but it is only academic. The actual fact of the matter is that CPUs don't subtract. They add.So while this is partly my response, providing clear, unequivocal evidence in support so that even Elliot can understand the situation on the ground, today, it's also an excellent segue, too. So I'm very glad for the opportunity Elliot affords me in expanding the discussion.

My first CPU was made from 7400 parts that I built and successfully completed in 1974. Newspaper reporters, to my surprise, showed up and wrote an article about it. That's my first experience. Since then, I professionally worked at Intel doing chipset testing for the BX chipset and, as a matter of relevance to teaching this subject, I've taught Computer Architecture classes as an adjunct professor at Portland State University in the 1990's, with class sizes of approximately 65-75 students. This is the largest 4-year university in the State of Oregon.

I feel equivocation (expressing ambivalence about how computations

mightbe done) about how processors generate their status bits and how they compute only leads students into unnecessary uncertainty, confusion and difficulty that can take hours, weeks, months and sometimes even years to correct. Just as teaching group-theoretic abstract algebra before getting the basics across would confuse most first-year algebra students, so also would teaching academic abstractions about how computerscoulddo things. More students would be damaged, than helped.The simple truth is that instruction decoding emits an ADD, even when the instruction text (it's just text, after all -- it's not what is actually going on) says SUB. The decoding still issues an ADD. It just modifies some operand details along the way.

Similarly, as it must also be in the case of the ARM processor, the above theory is all you need to understand how things are actually done.

Please don't confuse yourself! Computers add. They don't subtract. They just fiddle around a bit to make it look like they subtract.

For good or bad, it's important to understand what a computer

actuallydoes in order to understand certain status bits; what they do and why they do it. There's no other way around it. The above theoretical modelisthe way things work in modern processors and itishow to work out and understand the status bits, correctly. Thereisa good reason why things are the way they are.It's my hope that these details, above, and those I'll write below will be useful. Any failure to communicate here is mine and I'll gladly work to repair, amend, and improve this document where I may.

To continue, I'll be using the ARMv6-M Architecture Reference Manual as a reference.

Let's start on page A6-187 (register case):

Here, you can see that they clearly document this behavior:

This is an addition, with operand 2 (the subtrahend) inverted and the carry-in set to '1'. Just as I wrote happens, above. (It's just how it is done.)

In the case of multi-word extensions, go to page A6-173, and find SBCS:

Here note that they again use addition:

Instead of the carry-in being a hard-coded '1', as it is for the SUBS instruction, it's now using the last-saved carry-out value. In this case, it's usually expected that this will be the carry-out from a prior SUBS (or SBCS) instruction.

For multi-word operations, one starts with SUBS (or ADDS) and then continues the process with subsequent SBCS (or ADCS), which use the carry-out of earlier instructions to support a multi-word operation.

In multi-word addition, this carry-out can be thought of just as a

carry-out, which it is. A '1' indicates that a carry occurred and needs to be dealt with. A '0' indicates no carry occurred.In the case of multi-word subtraction, this carry-out is better seen as an inverted

borrow-from. A '1' indicates that there was no need to borrow from a higher-order word. A '0' indicates that there is a need to borrow. Since a SUBS instruction always sets this to '1', this means there's no borrow (the subtraction result requires an 'increment' in order to compensate for the inverted operand 2.) But for the SBCS instruction, if APSR.C is a '0', then no 'increment' takes place and this is the same as borrowing (since an increment is required, if there is no borrow.)The ADCS instruction, found on page A6-106 but not displayed here, also uses the carry-out of prior instruction executions. It doesn't invert the carry-out value or otherwise do something weird or different, just because it is an ADCS instruction. It does exactly the same thing as the SBCS instruction

andexceptfor one minor detail -- the SBCS instruction will invert operand 2 and ADCS won't. That's it.onlyThis is one of the really cool aspects about the way these details work. Very little added logic is required to turn an addition into a subtraction and/or a multi-word addition into a multi-word subtraction.

And finally, to complete the story, see page A2-35:

Consistent with my descriptions of how things actually do work, above.

It's really a pleasure to see how all this works. It's worth some time playing with different signed and unsigned values and, by hand, setting and using status flags. It really deepens these ideas. And they are very good ones!

All of the above is about understanding the status bits and how they are generated and why they are generated in the way that they are. If you focus on what actually

happensin a CPU, the rest just falls out as the necessary consequences and it's very easy to understand, then.A CPU only adds. It cannot subtract.

In your code example the

`SUB`

instruction has an`S`

suffix, this means that the sub instruction will set the condition flags, which the`BGT`

will evaluate. For the branch to be taken, the`Z`

flag must be 0, and the`N`

flag must equal`V`

The arm documentation clearly states that GT is a signed greater than, it will branch when Z==0,N==V.

When r2 = 2. Remember from grade school that x - y = x + (-y), and from day one (or shortly thereafter) in computer engineering/science/whatever twos complement negation is invert and add one so x - y = x + (~y) + 1. This saves on logic and is how we do subtraction

four bits is more than enough to see what is going on, the result is the same as 32 bits.

So N = 0 and Z = 0 from the result. The carry in and carry out of the msbit are the same so V = 0 (xor of the carry in and carry out of the msbit, can also do it by inspection of the msbits of the operands and result).

We need Z == 0 and N == V to do the branch, and they are, so the branch happens.

You will find this is the case for positive numbers since this is a signed greater than, if you wanted unsigned greater than then use bcs/bhs, logic works the same it just optimizes to using the carry out only (can see this as well if you look at the table jonk generated or generate one yourself)

When r2 = 1

Z = 1, N = 0, V = 0

N == V but Z != 0 so the branch does not happen.