FW: FW: [linux-security] Pentium bug makes security under linux/N

Joe Barrera (joebar@MICROSOFT.com)
Wed, 12 Nov 1997 20:38:35 -0800


> -----Original Message-----
> From: William Dixon
> Sent: Wednesday, November 12, 1997 8:28 PM
> To: Windows NT Development Groups
> Cc: 'ads5x@faraday.clas.virginia.edu'
> Subject: FW: FW: [linux-security] Pentium bug makes security under
> linux/N T/et c.
> Importance: Low
>
> FYI from a former research associate still in school. He provides quite a
> bit of detail on the Pentium opcodes which cause the problem.
>
> -----Original Message-----
> From: Aaron Schwartzbard [SMTP:ads5x@faraday.clas.virginia.edu]
> Sent: Wednesday, November 12, 1997 8:00 PM
> To: William Dixon
> Subject: RE: FW: [linux-security] Pentium bug makes security under
> linux/N T/et c.
>
> [William Dixon] [snip initial text...]
>
> All right, we've all heard about the new Intel bug (which is now being
> called the "F0 bug"), but what's the nuts and bolts of it? We all know
> that the problem is caused by trying to execute some bad opcodes. But
> you want to know more about it, which is why you are reading this. If
> the phrase "tries to execute bad opcodes" means nothing to you, then
> you probably won't get much out of this. Otherwise, you're all good.
>
> The bad opcodes are as follows
>
> f0 0f c7 c8
>
> In binary, that is
>
> 11110000 00001111 11000111 11001000
>
> So lets break that down into instructions
>
> 11110000 is the instruction LOCK. It is usually used only on
> multiprocessor systems. Basically, it locks the bus for the next
> instruction. (Actually, it's an instruction prefix, if you want to be
> anal.) You would use it if you are accessing shared memory. You lock
> the bus, then do some sort of memory access, and you can be sure that
> no other processor messed with the shared memory during your memory
> accessing instruction.
>
> So what is the rest? Well, 00001111 11000111 oo001mmm is the
> instruction CMPXCHG8B. It is valid only on Pentiums (and beyond). It
> is an instruction that takes one operand. The "Instruction Set
> Reference Manual" from Intel gives the type of the operand as "m64",
> and defines "m64" as follows:
>
> "m64--A memory quadword operand in memory. This nomenclature is used
> only with the CMPXCHG8B instruction."
>
> And it describes the instruction itself as:
>
> "CMPXCHG8B--Compare EDX:EAX with "m64". If equal, set ZF and load
> ECX:EBX into "m64". Else, clear ZF and load "m64" into EDX:EAX."
>
> It also gives a pseudo-code description:
>
> if (EDX:EAX = dest)
> ZF <- 1
> dest <- ECX:EBX
> else
> ZF <- 0
> EDX:EAX <- dest
>
> where "dest" is the operand ("m64"), and ZF is a flag. (Now if that
> isn't a handy instruction, I don't know what is.) So now that we
> understand this instruction, you may be wondering about that last byte
> that looked like "oo001mmm". The "oo" describes "mmm" (the 001 in the
> middle is just part of the opcode). If "oo" is 00, 01, or 10, then
> we're dealing with some funky addressing mode. (I realize that isn't
> the technical term, but I always was bad with addressing modes. Sorry
> Prof. Grimshaw.) If "oo" is 11, then "mmm" refers to a register. So
> you can see that with this instruction, we never want "oo" to be 11,
> because no register is 8 bytes.
>
> Anyway, "oo" is 11, and "mmm" is 000, which corresponds to the
> register EAX. So what is happening is we are trying to manipulate a
> four byte register as if it were eight bytes. The disassembly of those
> opcodes looks like
>
> lock cmpxchg8b %eax
>
> (This is from gdb on Linux, which uses the AT&T syntax for
> assembly. For a quick tutorial, check
> http://www.rt66.com/~brennan/djgpp/djgpp_asm.html.)
>
> That is obviously an illegal instruction. If you get rid of the
> initial "f0" and just try to execute "0f c7 c8" (that is "cmpxchg8b
> %eax"), you get an illegal instruction error -- even on pentiums. If,
> on the other hand, you change the first two bits of the fourth byte to
> 00, then you are not referring to a register anymore. You are
> referring to the address contained in the register. That is, the
> opcodes "f0 0f c7 08" disassemble to
>
> lock cmpxchg8b (%eax)
>
> and since the random value in EAX is probably not in your address
> space, you get a segmentation fault (or a GPF if you are of that
> persuasion). So that's basically what is happening. But that leaves us
> with some questions. Here are the questions and answers:
>
> Why does the lock need to be there for this to freeze the computer?
>
> The simple answer is: I'm not sure. Now for the complicated
> answer. Let me take another quote from Intel's Instruction Set
> Reference. This comes from the section about CMPXCHG8B:
>
> "This instruction can be used with a LOCK prefix to allow the
> instruction to be executed atomically. To simplify the interface to
> the processor's bus, the destination operand receives a write cycle
> without regard to the result of the comparison. The destination
> operand is written back if the comparison fails; otherwise, the source
> operand is written into the destination. (The processor never produces
> a locked read without also producing a locked write.)"
>
> So what I get out of that is that the LOCK on the bus is in effect
> until there is a write back to the bus. But the instruction fails, and
> does so entirely within the processor (since we aren't accessing
> memory outside of the processor). So there isn't a write to the bus
> because of the error, and the LOCK isn't released. But we can't
> recover from this error because we can't get the bus until there is a
> write by this stupid instruction that will never write to the bus
> because there was an error which will prevent it from ever writing to
> the bus to release the LOCK... Well, you get the idea.
>
> If that is really what happens, then it should be possible to achieve
> the same effect with other opcodes. I don't know assembly too well,
> but I figure you should get a similar result with any locked
> instruction that can produce this sort of runtime error completely
> within the processor. CMPXCHG8B just happens to be a good one because
> it is the only instruction (that I could find) on the pentium that
> tries to write 8 bytes to a location that is stated explicitly, and
> the instruction that deal with 1, 2, and 4 bytes will choose a 1, 2,
> or 4 byte register as is appropriate (e.g. if you try to get a
> instruction to put 4 bytes into AL, it will just put it in EAX --
> there's no getting around that).
>
> Why does this not happen on computers before the Pentium?
>
> Because the Pentium is the first computer to have the CMPXCHG8B
> instruction. Thus, these opcodes are just gibberish on older
> computers. "But," you might ask, "won't it still lock the bus and then
> have an error with the gibberish?" The answer is No. Remember, LOCK is
> not an instruction -- it is an instruction prefix. Sure you can write
> an assembly line with lock all by itself, but it will just be
> prepended to the next instruction. So LOCK needs a valid instruction
> to lock, and CMPXCHG8B is not a valid instruction.
>
> But that doesn't preclude the possibility of a similar thing happening
> on earlier computers. It's just a matter of finding an instruction
> that can produce a similar error. For example, on a 286, the
> instruction LIDT takes an "m64" operand. If you could use this
> instruction, you could probably do the same thing. However, as that is
> "Load Interrupt Descriptor Table" I'm guessing that you might have
> trouble getting your operating system to let you run it. (Then again,
> I hear that the 286's are pretty funky with protected mode stuff.)
>
> Why does this not happen on computers after the Pentium?
>
> If Intel was actually unaware of this before now, then this is really
> one of those "missed it by that much" (I'm holding my index finger and
> thumb about a centimeter apart) situations. Again, from the Intel
> "Instruction Set Reference" (this time it's about the LOCK
> instruction):
>
> "Beginning with the Pentium Pro processor, when the LOCK prefix is
> prefixed to an instruction and the memory area being accessed is
> cached internally in the processor, the LOCK# signal is generally not
> asserted. Instead, only the processor's cache is locked..."
>
> So there you have it. The computer knows that this register is in the
> processor (even though it doesn't know that the instruction will
> fail), so it doesn't assert the LOCK signal, and the bus is free for
> error recovery.
>
> And that's all I got to say about that.
>
> -----------------
>
> /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * Aaron Schwartzbard E-Mail: ads5x@virginia.edu *
> * Phone: (804) 243-1957 *
> * - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - *
> * I don't know karate, but I know crazy. --James Brown *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
>