Wednesday, August 11, 2021

News Updates: MTE

 For anyone who reads that is actually tracking this issue, this won't be news, but I haven't seen that much buzz about it online and I think that it's a pretty big thing.

In Android 11, Google added some special, software-based checks that were meant to emulate MTE by causing crashes if some app were to abuse a pointer to the heap. This was framed, essentially, as trying to ease developers into the mindset that you should be using pointers as they're given to you and not doing stupid things with them, and if you continue to do stupid things in the future, your code might stop working. So Google included this feature, but allowed it to be disabled.

Since then, google has published the Android 12 beta and enumerate a list of Compatibility Framework changes.

The above link should take you to the relevant section, but if you're reading this far in the future, I'll summarize:

Google has added the options for using MTE on hardware that supports it and added flags for two different tag checking strategies (Synchronous and Asynchronous).

Currently, by default, these MTE options are disabled, however, this further goes to show that Google is really hammering down on MTE support.

With the release of the Pixel 6 upcoming, I think it's safe to say with some certainty that it will support MTE.

I may sound like I'm freaking out over some obscure feature, but... Properly implemented, this is the same type of generational leap in security that the security community saw when ASLR was introduced, when W^X came about, when Apple adopted PAC on their iPhones. MTE, if handled correctly can kill large swaths of existing bugs and exploits and make finding new ones far more difficult than before.

I'm on the edge of my seat, don't know about you. Let's see what happens...

Security Concepts: Hash Functions

A note on timing: I know that it's been a little while since I posted last, and I haven't given up on blogging, I'm just balancing life. My last post on MTE was a big one that I worked on pretty constantly for a while and I got a little burned out, to be honest. I've been focusing on work and personal projects that I may write up at some point. Just saying, I'm here, talking into the aether.

To the post! 

I'm starting with hash functions because they have the simplest explanation in theoretical terms, even if their implementations are among the most complex algorithms. I'm not going to go into how specific hash functions are implemented here.

If you've taken one of the most basic computer science courses, usually called something like "Data Structures", you are likely familiar with a structure called a "hash table". The key piece of a good hash table is the hash function. If you remember that class really well (or you're currently in that class) you'll remember that the general requirements are that the function must be fast and is usually desirable for it to have a flat distribution of outputs based on inputs. (Spread the keys across as many buckets as possible.) With these you're on the right course, but Cryptographic Hash Functions (the only ones we care about in security) have more complete requirements.

Pause for nomenclature:

hash - /haSH/

  1. (hash function) (N.) Function that performs a hash operation
  2. (V) to perform a hash operation via a hash function
  3. (N.) the output of a hash function
  4. (N., illicit) ask your stoner friends
I bring this up to clarify that the use of the word "hash" can be confusing because it is the actual function, the action taken by the function, and the output of the function, depending on the context within which it is used.

Now, a little background on why we want hash functions in the first place.

Purpose

The purpose of a hash function is to take an arbitrary data input, having an arbitrary size, and process it and output data of a predetermined size. 

I know this sounds like saying "The purpose of a hammer is to hit things," which is technically true, but it obviously misses some important context. So...

Where would a hash function be used?

Generating Random Data

Generating random data from a hash function is the less common use case, but it's worth mentioning because it does happen, and it's important to know that this is in your toolbox.

There are plenty of reasons that you may need random data, but since hash functions guarantee that the output will be the same for any consistent input, they're useful for key perturbation (e.g. mixing stream specific attributes with a master key to derive stream key), generating seed data for RNG things (seeding rand with current time is great, but what if you seed it with super random things?), or any other case where you need a pseudo-random source of data that is either shared or re-calculable.

Obviously the first example is more clear that you would want a cryptographic hash function since it invokes a cryptographic use case, but the second may be just as important, what if the RNG is driving an online gambling platform? As close to pure random as possible can be extremely important sometimes...

Digesting Data for later Validation

Just the wording of this use implies something like a checksum, and this is true sometimes, hash functions can be used, and often are used, as checksums. The important thing here is that, while checksums traditionally protect the recipient from transmission errors, using a cryptographic hash will also protect the channel integrity from attackers who wish to modify the message. If an attacker was to modify the message but leave the hash alone, the recipient would know the message was invalid. Further, by using a cryptographic hash function (as opposed to a simpler checksum technique), the attacker has a harder (potentially impossible) time making the new data fit the original checksum.

Proof/Evidence of Existence/Knowledge at a Certain Point in Time

Imagine you could type some message, the tweet the digest and some metadata about your message (but not the message itself). Then later, you may choose to publish that message along with a reference to the tweet. As long as you haven't modified the message, it can be said that it is exceedingly likely that you had already typed that message out before you published the tweet.

Passwords work in a similar way, except the whole point is to obscure the "message" (password). Properly implemented password systems will use cryptographic hash functions in order to derive some data, then that data is stored, instead of storing the password. Then, when your identity needs to be verified, you supply your password, the data is derived again and compared to what is stored in the server. In fact, this can be implemented in such a way that the website never actually sees your password. This is a form of zero-knowledge proof. Basically, you're proving to the website that you have some data that can be used to derive some number, you know the solution to a complicated problem that, at the time, the server does not posses.

I can foresee myself writing a post at some point going into all the different ways, most of them incorrect and dangerous, that password management and validation can be handled, but this is not the place.

*Sigh*... I guess I should mention "blockchains"...

I may... MAY create a post about blockchains later, they're kind of played out on the explanation side, and it's become such a buzzword... At one point, I literally sat through a meeting with some business people and some engineers brainstorming how a company could market a "blockchain product"...

Anyway. Hash functions are at the core of blockchains. Each consecutive block in a blockchain has the hash of the previous block, this field must be included when calculating the hash for use in the next block to include. This is where the term "blockchain" comes from. The "block" is the data to be committed to the dataset, and the "chaining" happens when you include a hash of a previous block in a new block. The blocks are then all stored in a giant hash table by their cryptographic hashes. This way, if you would like to trace the chain from the head to the original block, you can just fetch by hash.

How would one use a "hash function"?

Hash functions are typically pretty easy to use, actually. Here's some examples:
Bash:

echo "asdf" | md5um

Python:

import Crypto.Hash.MD5 as MD5
md5 = MD5.hashFactory() # Get an MD5 context
md5.update(b"asdf\n") # add data to context
from binascii import hexlify # Get a way to print pretty
print(hexlify(md5.digest())) # return the hash and print it as hex

I would give a C example, but C can be dependent on which implementation you choose. Sometimes it uses uint8_t arrays for everything, sometimes there's typedef's for everything, sometimes you get a magic function that does it all, sometimes you have to construct and maintain and destroy a context, etc. It will usually look similar to the python above, construct a context, feed data through the hash function to update the context, get the output.

If you really want to play with it in C, you're better off finding the crypto library you wish to use and looking up an example of that specific library.

What makes a "Cryptographic Hash Function" a Cryptographic Hash Function?

I've talked a bit about use cases, how it's used, why it's used, why you should choose a cryptographic hash function over a non-cryptographic hash function, but what makes a "cryptographic hash function"?

Here, we'll have to go to the theoretical side and I'll describe the "theoretically perfect cryptographic hash function". We'll call this function: H. You know, for... Hash.

What does H look like? What is its signature?

H takes in some set of data and outputs a hash of that data that is always the same size, in pseudo-C, it would look like:

uint8_t[HASH_OUTPUT_SIZE] H(uint8_t* data, size_t data_size);

 How does it work?

This is actually not an important question. There are different approaches to hash functions that are used, but "how" is not actually important as long as it meets certain metrics.

Requirements:

All of these requirements are met in a theoretically perfect has function.

  1. Given the output, the input cannot be derived
  2. Output is only dependent on the input
  3. Outputs given two different but similar inputs are unpredictably different
  4. Output is collision resistant

Input Cannot be Derived

Hash functions are meant to be one-way. This should be fairly intuitive. H can take an infinite number of inputs, length ranging from 0 to infinity, on length alone, the number of possible inputs outnumbers the number of possible outputs by a factor of... well.. infinity. This means that, if you had a perfect mapping of inputs to outputs, it would be impossible to derive the input that caused a given output with 100% certainty because there would be an infinite number of inputs that could have caused that output.

So, by necessity, hash functions must be at least in part 1-way, but, the idea here is that for ANY output it should be exceedingly difficult to derive the possible inputs and choose the correct one.

Consider H is used to perturb the value of a password, you would hope that the original password could not be derived from H(password).

Output is Only Dependent on Input

This is exactly as it reads. Essentially, if you take the same input data and feed it through H, regardless of what system it runs on, what time of day, date, timestamp, astrological sign, color of sky, anything else, it should ALWAYS return the EXACT SAME THING.

Outputs Given Two Different Inputs are Unpredictably Different

This requirement becomes more important in cryptographic systems. If some attacker has some possible set of message formats, and they may observe the hash of some message, you don't want them to be able to derive which format is in use based on the hash (so long as there is some unknown portion of the message). The real reasoning behind why this is really important is hard to explain without understanding other topics that I have yet to cover, so I'll exclude those reasons for now, but just know that this is an important property.

Output is Collision Resistant

Collision Resistant sounds more like a property that I wish my last car had than a property of cryptographic functions, but it's a pretty foundational concept in hash functions. The idea is that for completely random inputs, there should be a flat distribution of outputs. For two randomly selected inputs, there should be two random outputs to pair with them. This really comes down to an efficiency issue, if the output of your hash function is 32 bits wide, why actually output less than 32 bits of entropy?

Common Misconceptions

Hash Functions are Encryption Functions

This is just not true. It may seem like semantics, but there is a really important distinction. Encryption functions are reversible; they are two-way functions. Remember earlier when I said that hash functions map an arbitrary size (potentially very large) of input to a constant size output? And remember how I said that this contributed to hash functions being irreversible? Hashing is not the same thing as encrypting. Every time I hear some cyber news source talk about how some company leaked a database of passwords but not to worry because "the passwords were encrypted" it makes me cringe. If they are wrong, and the password hashes were leaked, should this person be reporting the news? If they're correct and the passwords were protected by a reversible algorithm... Why shouldn't I worry?...
Again, it may seem like I'm nit-picking here, but, especially when you start talking about different security algorithms, hash functions are absolutely not encryption.

Hash Functions Add Entropy

Reference above where I said that hash functions always output the same data for a given input. Imagine if you have 2 bits of entropy. So, you feed a a hash function 4 different inputs: 0x00, 0x01, 0x01, 0x03. Every time you randomly select one of those inputs and feed it into the hash function, you will get one of four outputs. If you were publishing those hash results, while an observer may not be able to derive the mapping of output to input, they would be able to discern that there were four possible values. This is due to the amount of entropy in the input and the fact that the hash function doesn't change it. In fact, in actuality, hash functions may dump entropy. Sounds weird, but if you think about it, it makes sense. A data set cannot have more entropy than its length. So if you got data from a theoretically perfect random number generator that gave you 4KB of entropy in a 4KB message, then you calculated the hash of that message, you would loose almost all of that entropy. The hash would only contain at most as much entropy as its size allows.

That said, hash functions can be used to expand, disguise, or condense entropy. If you can generate a pure 64-bit random number immediately but must wait a long time to generate the next, you may feed that number into a hash function, take the lower 64 bits of the result and release it as a next number, re-hash the output to get a new output, release the lowest 64 bits, repeat. This does not add entropy, knowing what's happening, there's still a maximum search space of 64-bits, but if a pseudo-random stream that only has 64-bits of entropy is ok with you, then...
Disguising entropy goes along the same lines as expanding or stretching in the last example, basically, you may have some relatively small entropy pool and for functional reasons that is ok, but you want it to appear more random. To a human. That is NOT doing any numerical or statistical analysis.

Condensing entropy is actually an interesting concept and my favorite use of hash functions because it really is neat. Imagine you have some data source that has lower entropy than its length. imagine a book. Human text is really low entropy, especially when considering that even if you had a random stream of english, ASCII characters, you're using at most 7 bits out of every byte and not even the full range of those 7 bits. But, if the text to be hashed is longer than the output of the hash, you can start to get closer to the maximum entropy of the hash function. So, if the input text and output size were matched, the output would have less entropy than its length, but if the input were 500 times the length of the output, it may have the maximum entropy allowable in the hash.

HMAC is the same thing as Hash Functions

This is like saying that the ground beef is a burger. While this is technically... sort of correct... There's a lot more to it and the beef is only part of the burger.
First, HMAC is something called a Keyed Hash Function. Essentially, it's a hash function that derives extra perturbation from a secret key. This has some of the same uses as IVs do in encryption (if you know what those are for). HMAC works by running a hash function multiple times taking both a message and a key as input. I direct you to search HMAC on wikipedia for more information, it's a quick read.

That's it!

You should be pretty set at this point. Obviously I'm not covering all different hash function or how any hash function works in one blog post but... Hopefully someone finds this helpful.

Friday, June 4, 2021

Technical Mitigations: Memory Tagging

This post was originally about a quarter of the length, but as I edited it, it continued to grow. I have put a lot of thought into MTE and I hope that some of the understanding that I've gained is absorbed by reading this. I suggest reading this with a good drink. In writing this I had a manhattan, an IPA, a golden ale, unsweet tea, double espresso, flavored water, so basically anything (maybe a grown-up drink to numb the pain of any math involved?) (Or if you're reading for the real information, tea?) 

This post is about Memory Tagging. The inspiration for this post is the Memory Tagging Extension (MTE) outlined in ARMv8.5-A. So, I'll draw all my examples from AARCH64 in ARMv8.5. If you're into non-ARM architectures... Meh. iPhone and Android run on ARMs. Deal with it.

Tagged Memory, Not Tagged Pointers

Clear the air here, Memory Tagging is not the same as Tagged Pointers. In both, bits are stolen from the pointer value to hold extra data, but tagged pointers are more generic and typically used for book-keeping things like reference counts or address granularity. Memory Tagging is a very specific use of stolen bits for security/reliability purposes.

Exploit Techniques

I know what you're thinking, "Ronman, I thought this was supposed to be a post about MTE, we are talking about exploits?" Yeah. Yes we are. This is important because we can't really appreciate what this mitigation technique does for us without understanding what it's supposed to prevent. Therefore, we're going to go over a few exploit classes relevant to the subject.

Note: If you're familiar with the general types of exploits, skip to "What do they have in common?".

Buffer Over/Underflow

This is the simplest bug possible really. Imagine an application that asks for your password. The application has reserved 64 bytes of space on the stack to store the password that you enter because, who would have a password that's 63 characters long? (Don't forget the null-terminator!) Well, what happens if you enter 68 characters? If there's no bounds checking on the input function, then you can actually overwrite extra memory on the stack. Or the heap. Or in an object that has a member with a statically sized buffer for... something.

Buffer underflows are similar to buffer overflows in that they involve writing outside the allocated buffer. The difference is that instead of writing after the allocated buffer, the bug exploited allows an attacker to write before the allocation.

Buffer over/underflows can also be caused by providing a bad length for an object, in some protocols structures/etc. are sent with a preceding length, so if you tell the other side you are going to send 32 bytes, they allocate 32 bytes, but allow you to write 36 bytes, that can be bad.
Integer Over/Under flows can cause bad indexing into a buffer. This is another form of buffer over/underflow but is more often characterized as an integer overflow bug, as that is the underlying error that allows out of bounds access.

Use After Free

As the name implies, a use after free exploit takes advantage of a bug in software where a reference to a struct/object is used after the actual object has been freed and already re-used for something else. This can happen if the interface of some function requires that a pointer passed to it be long-lived or passes ownership of the object but the caller does not treat it as such.

Easy ways to cause bugs that may result in use after free vulnerabilities would be things like:

  • Passing a stack address where a long-lived address is expected
    • Think, enqueueing something in a list, adding to a hashmap, or a pthread context parameter
  • Misusing functions that take ownership of memory passed to them
    • Picture a function that promises to call 'free' on a pointer passed to it

The thing to note here is that it becomes possible to have some code act on a "stale pointer", that is to say that the data at the address to which the pointer points no longer contains the data expected. Stale pointers on their own aren't bad, even using them could cause no noticeable problems. (DON'T USE STALE POINTERS) Where things get tricky is if the address to which the stale pointer points becomes allocated for something else. The stack address passed into a pthread context parameter becomes allocated for some function's stack frame in the thread that started the pthread. Now the second thread has a pointer to the stack of some other thread! Or in the case of passing memory to a function that will free it, then trying to use the memory subsequently. Then the pointer after the function call is stale and that address may refer to something that is allocated and in use by something else.

The point is that use after free is a exploit technique that takes advantage of situations wherein stale pointers cause effects in memory that has been reallocated since the pointer went stale.

Type Confusion

This is an edge case for memory tagging as a mitigation, but I bring it up because it can technically be helpful.
In a type confusion, an interface is abused/ignored. This means casting a polymorphic type down the chain of inheritance farther than you should, or casting a 'void*' to something that it's not, or misinterpreting some other thing to cause you to treat something some other way than you should.
Specifically in the polymorphic cast, and others where the incorrect type is larger than the correct type, then a read/write on that bad type could cause an out-of-bounds access on the allocation of the real type with respect to the address of the real type.

What do they have in common?

Out of Bounds Access. That's really what it comes down to. Out of bounds in space OR time. Either. Or both. A variable or structure exists within some memory range AND some time range. If you access it outside of either of those ranges, bad things can happen.

How do these show up in the wild?

OOB bugs are everywhere. Like, everywhere. They can be the driving bug that gets malicious code to execute or they can be one of a few in a chain that leads to code execution. Breaking the ability of malicious actors to have OOB access for read/write is fantastic for security.

What the Heck is Memory Tagging Anyway?

Memory tagging is a process by which the system tries to revoke invalid access to memory. This is the technological answer to the techniques discussed in the previous section.

Memory Tagging involves a separate memory space, reserved specifically for memory tagging, from here on, I'll refer to this as the Tag Store.

It also requires that upper bits are stolen from pointers. This usually isn't a problem in 64-bit architectures, because 64 bits can address way more memory that you would have mapped into a process. That said, this mitigation doesn't really work on 32-bit systems. In ARM, MTE uses the TBI (Top Byte Ignore) feature; this means the top byte is handled as completely separate from the pointer. Four out of the eight top bits are reserved for the memory tag. It's important for this to be the top byte because when arithmetic is done on the pointer, the tag should not be modified.

Memory Tagging works by acting at three different times:
  1. Memory Allocation
  2. Read/Write
  3. Memory Deallocation

You should note that these are basically the three stages of an allocation's lifespan, so, it kind of makes sense. Remember I mentioned it was the spatial and temporal boundaries of an allocation that this is meant to enforce.

I'll go into those action stages below, but I want to make clear the general concept of MTE. MTE essentially applies a lock-and-key system to read/write access to memory at the hardware level. The tag bits in the top byte are like the key to a door. (Not to be confused with a cryptographic key.) The tag store combined with the actual Memory Tagging Extension (a subcomponent of the CPU) make up the lock. Much like you can't sit at your neighbors door and try all 100,000 possible keys to see which will open their door, an attacker can't just try all 16 possible tags -remember the tag is only 4 bits wide. If one tag validation failure is detected by the tagging extension, bad things happen...

Memory Allocation

During allocation, a tag is assigned (randomly, but possibly such that the adjacent allocations have different tags, it depends on the implementation). The same tag is assigned to the entire block allocated. In  ARM, this is done with a resolution of 16 bytes. The idea of taggable blocks being larger than one byte is common because, with a tag size of 1/2 byte, your Tag Store would have to be 1/2 the size of your usable RAM, 1/3 of the total. With a resolution of 16 bytes, 1/32 of your total memory must be used for the tag store, that is 3.125%.

The tag is inserted into to the pointer before it is returned from the allocation function (malloc/calloc/etc). So if you were to wrap malloc with memory tagging logic, it would look something like this:

void* malloc_mte(size_t alloc_size) {
    // MTE works on 16B segments, allocations must be 16B increments
    size_t delta = alloc_size % 16;
    // Adjust size if needed
    if (delta) alloc_size += (16 - delta);
    // Perform normal malloc allocation
    void* allocation = malloc(alloc_size);
    // Don't try to tag NULL, that'd be silly
    if (allocation == NULL) return NULL;
    // Tag the region
    uint8_t tag_byte = apply_tag_to_range(allocation, alloc_size);
    // Apply the tag to the pointer
    return (tag_byte << 54) | allocation;
}

In a proper implementation, the malloc function would do the tagging on its own, also there are instructions in ARM to apply the tag to a memory region a pointer to the region at the same time, but for the sake of example, I separated that.

Read/Write

During a Read or Write operation, the pointer that holds the address must also contain the correct tag value. In this case, correct means that the bits embedded in the pointer match the bits in the tag store associated with address range of the read/write. If the tag is correct, the read/write happens successfully, like normal. If the tag is incorrect, then the system faults. This can result in a segfault, an exception, a hardware interrupt of some sort; basically, it's up to the implementation. What makes MTE so great is that this validation and possible fault happen during every read or write instruction. That is to say that there is no special "tagged read" or "tagged write" instruction. If MTE is enabled, the normal LDR, LDP, LDM, STR, STP, STM, etc. will cause the tag in the address register to be validated. So if normal code uses normal means to allocate memory (eg malloc/calloc or even mmap) then any time an address in that range is referenced, the tag is validated, completely invisibly to the code.

This process can be pseudo-coded as the following functions for generic 'read' and 'write':

uint64_t read64(void* addr, uintptr_t offset) {
    void* read_addr = addr + offset;
    if (!check_tag_for_addr(read_addr)) raise(SIGSEGV);
    return *((uint64_t*)read_addr);
}
void write64(void* addr, uintptr_t offset, uint64_t val) {
    void* write_addr = addr + offset;
    if (!check_tag_for_addr(write_addr)) raise(SIGSEGV);
    *((uint64_t*)write_addr) = val;
}

Now, again, this process would take place within the assembly instruction execution, so, the last line in each of those functions, the one that actually reads/writes, would be the only assembly instruction. With MTE enabled, the assembly for those functions would look like:

read64:
    LDR X0, [X0, X1]
    RET
write64:
    STR X2, [X0, X1]
    RET

Obviously, these functions would never be used, the instructions would just be in-line in functions, as normal read and writes. That's what makes it so great, the validation happens within the normal read/write.

Memory Deallocation

Deallocation has two effects. First, the tag is validated. If the validation fails, bad things happen. In the same way as with the read/write faults, a stale pointer is being freed, this is bad, so a fault is raised. How can I say that it's a stale pointer being freed? Because if the tag validation succeeds, the allocation is un-tagged (or re-tagged, again, up to implementation). This means that a double-free will likely result in a tag being invalid.

The implementation of this is a little more complicated, and necessarily intertwined with the implementation of 'free' because, if you remember your stdlib signatures, free takes only the address of the allocation to be freed. The heap implementation will be able to derive the size of the allocation, and therefore will be able to un-tag all of the allocation that was tagged, but without going into the internal function of heaps, it's just a little complicated to show here.

Isn't segfaulting dangerous?

Yes. In short, just raising a segfault is dangerous; it means the app will crash. But the system will be allowed to continue. So it's not really that bad. Further, segfaulting is much better than risking a malicious actor having arbitrary memory access.

I must also say that there are multiple proposed strategies to handle a failed tag validation. The most extreme example is a segfault. This is would would be described as a "fail-secure" solution. There is an alternative strategy, a less extreme, but no less effective.

The less extreme strategy is to record telemetry data and possibly enable higher levels of surveillance on the offending process when an invalid tag is detected. Virtually all exploit chains require more than one out of bounds read/write. This means that they will almost certainly be detected. When detected, anything they do can be more easily reverse engineered, and any further vulnerabilities can be patched. Even the initial vulnerability may be caught on the first invalid access, depending on what form it takes.

Further, combined strategies can be used, eg, if a process fails 3-5 tag validations, that process is stopped. This would help to mitigate really bad memory management so applications would be essentially unusable unless they were correctly handling memory. When deployed, this strategy would also be likely to stop attackers because that is still a very small number of accesses.

Wait... This sounds a little familiar...

If you're saying this, you're probably thinking of Address Sanitizer (ASAN) or Hardware-Assisted Address Sanitizer (HWASAN). Yes, it's similar, but this takes that idea further. ASAN is software driven and adds a lot of overhead in space and time, AND it won't catch near as much as MTE will. Even Hardware-Assisted ASAN, being faster/smaller than ASAN, pales in comparison to MTE. (Or similar implementations.)

MTE has much less code overhead, because it's built into the CPU architecture it's significantly faster, and it requires much less memory. On top of all that, MTE is much more capable of detecting bad access. Probabilistically, but greater than 90% of the time, out of bounds access, use after free, stack smashing, heap smashing, even some type confusions can be caught at runtime.

How Exactly?

We've talked a lot about exploits and MTE, compared it to (HW)ASAN, but how does it actually catch bad guys red-handed?

Basically, these bugs rely on accessing memory outside the allocation they reference. Either in space or time.

Buffer Over/Under flow or Type Confusion

I'll group Buffer Abuse and Type Confusions together here because, as far as MTE is concerned, they work in the same way: Access outside the spatial allocation. Remember, the point of MTE is to protect against access outside the allocation. The way it does this is by catching invalid access via bad/unmatched tags for the address. When you try to access an address that is an arbitrary offset from the address with a proper tag, there is a 1/16 (6.25%) chance that the tag will actually match and the read/write operation will succeed. This means the 15/16 times (93.75%) an invalid access will be caught before it has the chance to do anything bad. This may seem like it leave room for attackers, however, it really doesn't. Even if whatever process let an attacker try a read or write was repeatable, they would likely require more than one read/write to accomplish the goal. With two operations, the odds of getting away with it drop to .391% (1/256).

Use After Free

This type of bug comes about when memory is not being handled safely. Specifically, if memory is allowed to be reused before it's free, or freed and reallocated before its last use.

Here's an example of what I mean:

uint32_t* first = calloc(1, sizeof(uint32_t));
uint32_r* second = NULL;

free(first);

second = malloc(sizeof(uint32_t));
*second = 0;
*first = 1;
*second++;
printf("First @ 0x%016" PRIxPTR " = %i\n", first, *first);
printf("Second @ 0x%016" PRIxPTR " = %i\n", second, *second);

Hopefully you can see that this will result in undefined behavior. Everything is fine until the line that has "*first = 1;". Because 'first' has been freed, that pointer is stale. It may point to some unused place on the heap, or it may point to the same place that 'second' points. There's no way of knowing from this snippet. However, if MTE is applied to this situation, when 'first' is freed, that memory will be untagged, and when second is allocated, that memory will be tagged. So let's examine the possible outcomes:

First Points to Unused Heap Space

In this case, access to first will be caught by the tagging extension because first still contains a tag in the top byte, but that memory space has been untagged already. So that will almost certainly cause a fault.

First Points to the Same Memory as Second

In this case, the element in the tag store that tracks the space allocated for 'second' has been retagged since it was allocated for 'first', so, again, the access will almost certainly be caught. 

That sounds great, but how many arms and legs will it cost?

I know I'm selling this hard, but the benefits are important. There are costs, and they're not negligible. In my opinion, they're worth it, however, that will in the end, be up to the engineers working on tomorrow's designs to decide.

Transistors!

To me, the most obvious cost is actual transistors on the chip. Anytime you add functionality or capacity or anything to an integrated circuit (which processors are) it will cost transistors. These transistors could be used for something else.

If you don't know about the lower workings of processors or ICs in general, think of the substrate as a markerboard. You can draw whatever you want, but you're only allowed a certain amount of space. If you fill it all in with one component, you run out of space for anything else.

Tag Block Size as a Memory Cost

One cost of MTE is the added memory consumption. This comes in two different places. The first -and most obvious- is in the tag store. This store must exist somewhere, either in normal memory space or in dedicated hardware, it makes no difference, if you only have so much space on your chip, some of that space must be carved out for the tag store.

The second place you'll lose out on memory is the added consumption per reservation. Different allocation methods have different reservation resolutions, generally the stack is aligned with the register size of the CPU, so 32-bit CPUs use a 4-Byte aligned stack, likewise, 64-bit CPUs use 8-Byte aligned stacks. Unless you are using some special purpose memory structure -like a custom heap- the stack will usually use the smallest alignment. Malloc is guaranteed to return an allocated address that is aligned with the maximum aligned read/write size of the CPU, essentially, the same alignment as the stack. The important thing to note here is that malloc may return a more aligned piece of memory. If you're operating on a 64-bit CPU and malloc returns an address guaranteed to be 16-Byte aligned, then the guarantee that malloc returns 8-Byte aligned addresses is still satisfied. This is necessarily the case for MTE implementations.

Because the Memory Tag Extension only tracks memory addresses with a certain granularity (16 Bytes) the allocation system (the heap implementation in the case of malloc/free) must actually ensure that it only allocates increments of the tag resolution.

Tradeoffs on the Stack

Because using MTE will actually consume more instructions, there is going to be a processing time cost. Up to this point, we've basically been talking about MTE with respect to the heap, but, MTE can be used anywhere.

On the stack, MTE could be extremely useful for protecting members of the stack frame. The downside of this is that each element on the stack would likely need to be tagged separately, this means that rather than one instruction to allocate the stack space, it would take a number of instructions proportional to the number of variables on the stack. Then if there was stack space re-use, that space would likely need to be retagged as well!

Tagging Object Members and Array Elements

While it may seem like it could be possible to tag separate members or elements of structs or arrays, it's just not possible. In some higher level languages it may be possible, but for those the different elements in the classes/arrays are more like separate allocations. I'm really talking about low level languages (C/C++).

Imagine, for a minute... A Protocol Data Unit (PDU/Packet) with a type field and a payload field. Those could be tagged separately, right? Sure. Problem is, when you read the type field, it will tell you that you need to cast the packet to a different subtype of PDU. Anyone who's dealt with protocols or file formats will be familiar with this. It's everywhere. But what this means is that unless all the possible subtypes of PDUs have exactly the same structure, the fields in any subtype cannot be tagged. Because C/C++ data types are handled identically regardless of if they're nested in another type or on their own, this means that those sub-PDU types can never have their members tagged separately. Because any struct may be used as a subtype for a PDU (or cast from another struct type) (the compiler doesn't know what you're doing) no struct can ever be tagged.

Arrays are similar, basically what it comes down to is that C/C++ allow you to do some really nifty things with arrays and you can't start tagging the elements separately without taking a lot of power away.

Can we preserve ABI while implementing MTE?

ABI, or Application Binary Interface, that is the definition of the interface into the binary, functions, objects, etc, can be preserved.

Remember I said that everything between allocation and deallocation is handled the same as if MTE were not present. This means that only allocation and deallocation need be updated. On the stack, this is handled in the same place, so there's no interface change because the same code is allocating (would have to tag the memory) and deallocating (would want to re/un tag the memory).

Tagging the heap works in the same way. Despite free possibly being called in an entirely different part of the code as malloc, malloc and free are to be implemented by the same library, so whatever code is in charge of allocation may tag and deallocation code may untag. The callers of these functions can ignore that aspect.

If it were possible to tag different members of structs differently (it's not, see above) then that would require an ABI change, but that's not something we actually have to worry about.

Into the Wild

We've made it though the technical part of this post. I know it's a long one. If you made it this far, you might have several questions, one of those questions might be, "Ronman, why should we care about MTE, like, if no one has implemented it..."

You'd be correct. But also incorrect. I haven't heard of a system adopting MTE yet. That doesn't mean that the hardware doesn't exist, nor that software isn't catching up. From the circles that I primarily follow, there are two big names that come up with respect to MTE: Apple, Google.

Apple

Historically, Apple has led the charge on a lot of security/privacy related features, up to recently, MTE is no exception. Apple's current generation of mobile processor, the A14 is an ARMv8.5a, which would include MTE. Apple's latest desktop processor, the M1, would also include MTE. Where they haven't done anything yet is in software. While they likely have all the needed hardware support, they haven't rolled out the code to use it.

Google

Google announced that they were investigating using MTE a while back (w.r.t. 2021). This is an interesting switchup because Apple embraced PAC (Pointer Authentication Codes, a story for another day) and Google just seemed to wave as it passed. With MTE, Google seems more motivated though. In Android 11, they began inserting a dummy tag to all allocated memory. If your application frees this memory but has perturbed the dummy tag, your application will crash. This is just to prepare developers for when google throws the switch... Which may come soon.

Google's next phone (w.r.t. June 2021) will be the Pixel 6, which will have their first custom SOC. The Whitechapel chip is almost certainly contain MTE capable CPU hardware.

This is strictly based on rumor, Wikipedia only knows of two processor cores that support MTE Apple's Firestorm and Icestorm (https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_cores).

Rumors based on older information still list older cores that wouldn't support MTE. We'll just have to wait to see.

In Closing...

So MTE is going to drastically change the landscape for security in devices that implement it. This is going to be a really cool time.

If you found this interesting, I'm looking at writing up more on different technical mitigations, most won't be this long, but... Meh.

Further Reading

Armv8.5-A Memory Tagging Extension (Arm Whitepaper)

LLVM's Documentation on their usage of MTE: https://llvm.org/docs/MemTagSanitizer.html

If all else fails, "google it".


Saturday, May 22, 2021

Security Concepts: Introduction

I'm sure there are lots of places that have this information, in fact, I'm 100% sure because I'm not quite smart enough to have come up with any of this early enough to have my name on papers. I read this somewhere, you could surely read it in the same places. I am writing this, as will be my norm, to condense a set of knowledge (that I think might be helpful to someone) in a singular place.


So, who the heck is meant to consume this information? First off, whoever wants to read it. Second, the target audience is a relatively technical group, not unfamiliar with programming concepts...

Ok, really math concepts, but also, we're not really talking math... The types of things I'm talking about are functions, numeric bases other than 10 (specifically base 2), boolean logic/algebra, the idea that everything in math/computers is a number. That's basically it. If you generally understand that, skip this next little bit (to the poop). If you're a little fuzzy on one or two of those, this next section is for you...


What's a function?

A function is a process that takes a specific set of inputs to yield a specific set of outputs, generated with respect to the given inputs. If you start to think about it, functions are everywhere. Fast food? What you order decides what they give you. Fast food has a functional interface. Vending machine? Press buttons as input, get snacks. Functional interface. Important thing to take away is that the output of a function is defined by the inputs.


What are "numeric bases other than 10"? (And why base 2?)

MOST. IMPORTANT. THING. Numbers don't have bases. REAL numbers, the actual concept of 4 can be written in many ways, but it still represents the same thing. You and I can picture 4 items in our heads and be confident that we're talking about the same number of individual things. REPRESENTATIONS of numbers have bases. Essentially, the number of symbols used to define the set representable numbers within an order of magnitude is the base of the representation. We're, as humans, most familiar with "base 10". It's called 'base 10' because we actually use 10 symbols: 1, 2, 3, 4, 5, 6, 7, 8, 9, AND 0. (Weird side note, the base of a number system is written in base-10. Because we're humans.) So. The representation of each number is thought to have an infinite number of leading 0's (the smallest symbol representing a lack of anything) When counting, add 1 to the smallest end of the number (the right). If that digit is already at the max, add 1 to the next digit and set the lowest digit to 0. Repeat this same logic as you go up, if the second digit is already the max, set it to 0 and add 1 to the third digit, and so on.

With this info, you can now count in any base you want.

Base 10: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...)

Base 7: (1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15, 16, 20, ...)

What about base 1? (1, 11, 111, 1111, 11111, 111111, 1111111, ...)

*Base 1 is a little weird, if you move with the idea that there are 0's preceding the significant digits, base 1 doesn't really work, but whatever.

Base 2:..

Wait, I said that base 2 was important right? Base 2, also known as Binary, is important because in modern computers, transistors allow us to track the state of things as OFF or ON. 0, or 1... See where I'm going with this? I'll skip all the stuff involved in the 3-5 undergraduate classes from which I derive that idea and just give you the answer, all numbers on your computer are represented, at their core, operated on, saved, written, interpreted as base 2. 

Computers count like this: beep boop beep boop.

(Translated from computer:) 1, 10, 11, 100, ...

100(base2) is the same value as 4(base10).

Further note on this, most of the time, programmers et al don't use binary to represent numbers. We always know that they're being handled as binary under the hood, but it's not efficient to represent them in written text as such. Humans take a hybrid approach to efficiency and readability and use something called "hexadecimal" or base-16. Now, there are only 10 Arabic Numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) So how do we have 16? We cheat. F comes after E comes after D comes after C comes after B comes after A comes after 9. So the sequence of numbers in an order of magnitude are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, etc.


Boolean Logic/Algebra? Is this a math post?

No... But everything is actually math...

For real, boolean logic is the set of operations (or functions????) that can operate on bits (a 1 digit, binary number) or larger binary numbers. These functions are: NOT, AND, OR, XOR, NAND, NOR, XNOR.

NOT: Takes one bit and inverts it:

    (1) -> 0, (0) -> 1

AND: Takes 2 bits and outputs a 1 if both inputs are 1:

    (1, 1) -> 1, (otherwise) -> 0

OR: Takes 2 bits and outputs a 1 if either or both bits are 1, or, stated differently, outputs a 0 only if both bits are 0:

    (0, 0) -> 0, (otherwise) -> 1

XOR: Takes 2 bits and outputs a 1 if exactly 1 of the outputs is a 1:

    (1, 0) -> 1, (0, 1) -> 1, (otherwise) -> 0

NAND: AND, but the output is inverted with NOT

NOR: OR, but the output is inverted with NOT

XNOR: XOR, but the output is inverted with NOT


That's as mathy as these posts are likely to get right there. All that computers do can be boiled down to those operations.


"Everything is a number" sure sounds like math to me...

You got me there. But follow me for a minute. Remember, above, I talked about the most important thing being that REPRESENTATIONS of numbers are the things that have bases? Well, your computer has 0 clue what a cat is. It doesn't understand what socks are. Or what the alphabet is, for that matter. But it can act on the representations of those things. The representation of a picture of a cat boils down to a large set of numbers. Don't believe me? Well, if you lookup a picture of a cat, and look really closely, you'll find it's divided into little rectangles, pixels. Each pixel is colored by mixing 3 colors. The intensity of the mix of each color in a single pixel is given by a number. So each pixel is made of numbers, each picture is made of pixels, so each picture (by extension) is made of numbers. Same thing with pictures of socks; there's nothing special about cat pictures. Letters are a little different, there's a different style of representing those, but that letters are represented to the computer as numbers. Historically, that representation was usually something called ASCII, but now it's usually Unicode. (This isn't a post about text encoding so don't worry, I don't even know what ASCII stands for.)


If you've made it this far and feel like continuing, great. You're either nodding along or wishing you'd skipped to the poop before. Either way, you've made it. I'm proud of you.


💩💩💩💩💩💩💩💩💩💩💩💩💩💩


This post is meant to be the first in a series on cyber/information security. These posts will go through terms, concepts, primitives and make vague connections to real-world situations and algorithms. Heck, who knows, by the time I finish this series, quantum computing may have changed the cryptographic landscape enough where none of this matters anymore.


Being that this, while the introduction to the series, is actually the first post, I'll define some terms that will keep coming up as we go.

Privacy

At the core of basically all cyber/information security is the idea of privacy. Privacy is the concept that something can be done without someone else observing it, either because no one is looking or because, even if someone was looking, they would not be able to perceive what was happening.

You hear a lot about privacy in talks about end-to-end encryption, private messages online, browsing history/web requests, bank transactions, etc. Generally, privacy is very important, imagine if you weren't free to conduct a banking transaction in private. Then all the bad guys would have your password, and before you knew it, they'd have all your money too.

Security

Security, given the definition of privacy is both easier and harder to explain. In short, security is privacy, but concretely provable. Security is the enforcement of privacy internally. Privacy is what we want, security is privacy with extra steps, and the rest of these things provide the provability required for security.

Encryption

Encryption is the process of perturbing some data set such that the original data cannot be easily derived from the perturbed data. The original data, henceforth, will be called "plaintext". Further, the perturbed data will be called "ciphertext". Thus says the king.

An encryption algorithm takes at least two things, plaintext data and a key. The algorithm will output some other data as ciphertext. The important thing here is that one cannot derive the plaintext or key from the ciphertext. This means that you could write a murder confession and encrypt it, and publish the encrypted version in the Daily Bugle (or the Daily Planet if you're more a DC person) and NOT get caught. *So long as you also publish neither the plaintext nor the key by mistake. That would be bad.

That on it's own is not super useful though. An important feature of encryption is that it is reversible. This means that given some algorithm, a key, and a ciphertext, the original plaintext can be recovered.

So how's this useful? Imagine you have a way to derive a key, some function... a... Key Derivation Function? That will convert a text password to an encryption key... a Password Based Key Derivation Function?... Now, you use a password to derive a key, encryption some file with that key. Then later, you don't have to remember the key, just the password used to generate the key. Enter the password, derive the key again, and decrypt the file. *This is, by the way, how password based file encryption works. **This is an example of encrypting data at rest.

Now, imagine you and a friend have previously agreed on some key. When/how you agreed on that key is not important now, what is important is that you may encrypt a message to your friend and send the ciphertext. Then, because they have the key, the may decrypt the message to get the plaintext. All the while, your mortal enemies on the internet (probably people from the youtube comments) can't read what you're sending.

Authentication

Authentication relates to the origination of some set of information. The information in question could be a data stream from a website or a file that was emailed to you, or just an email. How do you know that it is actually from whoever it says it's from?

Authentication is the process by which you can be sure that, at some point, private information known only to a specific entity was used to sign off on some piece of information. It does not guarantee that the information was generated by that entity, just that their private info was used to generate a signature. In fact, much like your secretary (who has those anymore?) could draft a note for you to sign, so can signing authorities be used to rubber-stamp signatures. This isn't a bad thing though. Authentication is really important because if you don't know who you're talking to, it would be too easy to give out the password to your bank online. If you're using a modern web browser, there's likely a small lock icon in the URL bar next to the URL for this blog, that shows that the webpage you're viewing is both encrypted and authenticated.

Anonymous

Anonymity is a big deal recently. The idea is that you can do something online and not have your identity revealed. Think of anonymity the anti-authentication, where authentication is a provable way for you to say that a certain entity had contact with certain information, anonymity is plausible deniability.

In the modern world, anonymity is really hard to achieve properly, I might make a post speculating, war-gaming how one might achieve this goal, but there's not really a lot of provability because it's just a lot harder.

Validation/Data Integrity

This is the idea that you can take some large amount of information and somehow determine if it was changed. Usually, this is done with message digests (hashes, CRCs, checksums) or forward error correction (black magic). Data integrity is important in two general scenarios. The first is the simple case: You don't have a perfect data link. That is to say that by the nature of your connection, some data may change during transmission through no malicious means. Think of radio waves or loose wires. Happens all the time. It's good to know when your data is been corrupted. The second case is that someone has maliciously perturbed your data to have some effect. If someone is changing the destination of your rent payment before it gets to the bank, you definitely want the bank to be able to tell that it's been re-routed, this is where data integrity checks AKA validation come in.

*Random note here, if there's a random sentence here and there that end in semicolons -- ';' -- I'm sorry. I just caught myself doing that. This is the most I've typed outside of semicolon terminated text files in a long time.


Characters

What's life without a little fun?

In the world of security, we use examples and hypotheticals a lot. This means that it gets boring reading "Party A sends Party F another message with the pre-master secret derived from the previous message from Party E" type things. Instead, we use names.

The most common names are Alice and Bob. These are generic, friendly people in the examples. They're the ones who are trying to communicate securely. Eve is likely the third most common (Eve being short for Eavesdropper).

There are lots of names. Go over to wikipedia and check out /wiki/Alice_and_Bob for more information.


*Note how I didn't put a link in there? I try to avoid links. you can copy/paste that '/wiki/Alice_and_Bob' after typing 'https://www.wikipedia.com' and it will take you to the right place. The important thing is that there's hardly any way here that I could guide you to a malicious webside. I highly advise you do that too, if you want to share a video in an email/text message, just grab the video ID out of the URL, look after 'watch?v='. Whatever comes after that can be searched in youtube. The first result will be the video you're trying to share, don't believe me? Try searching this on youtube: dQw4w9WgXcQ


And that just about wraps it up. I hope I didn't bore you to sleep, or at least, if I did, that's why you came here in the first place. If you found this interesting enough to make it this far, stick around, or if you're from the future, read on; there's more to come...

Saturday, May 15, 2021

Introduction: Hi, I'm Ronman

 Good... Evening I guess?.


I'm Ronman (or Aaron to those who verbally address me) and this (gestures around) is my blog.


I'm going to start be being perfectly honest, I'm not sure exactly how often I'll post, how interesting it will be, etc. However, we are going to give this a try. By we, I mean, me. I'm going to do this.

For now, this is just a blog, however I have considered branching out into videos later on down the road, but I'm not promising anything.


Back to introduction.


My name is Aaron, however, as a young child, I would royally peeve my mother and aunt. When they would get mad, they would say "aaaRROOON", thus, I was given the nickname of "ronman".

A good while after that, I was introduced to programming. My father is a MS.NET developer and has been for many years. He introduced me to MS Visual Studio, developing a Windows Forms. My first application was a math helper. It showed various shapes and different formulae that could be worked to find area, perimeter, etc. It wasn't very polished at all, but it was the first thing I did, so I was happy with it.

A few years later, I was gifted a set with an Arduino Uno and various components. This sent me down a rabbit hole of programming in embedded applications. Now, this had a profound affect on me. I quickly learned a lot about the software side of Arduino and basic robotics.

After high school, I attended the University of Arkansas at Fayetteville. I received my Bachelors in Electrical Engineering (though I just forgot to add the minor for Computer Science/Engineering?).  Much of what I do is higher level than electrical engineering, but the decision to major as an EE was well thought out and I don't regret it one bit. It goes like this. At the time of selecting my major, I was considering a career as a robotics engineer. (Spoiler: That's not what I do.) The thought was that you didn't need to major in CS AND EE, but if you majored in either of them it would really help if you wanted a career in that area. Software development came naturally, I had no problem finding references online, and some of the EE stuff was.. magic? So EE.

I got my first internship between Jr and Sr year at a primarily communications company. I worked there essentially writing documentation for a set of VMs to be deployed together. It wasn't that fun, but it drove my change in career from robotics to security.

I went back to that same company after college (the next year) and started my career in security/cyber/reverse/forward/all-around engineering.

I've been working in security engineering for 5 years now. It's been a blast. Most of the time. It's had its ups and downs, there have been boring days where I've written documentation all day, and stressful days where I was first in and last out (like a stack, see?). Along the way, I've learned a lot. Hopefully, putting down some of the things I've learned will help someone out there.

In my free time, as at work, I write code. I also do more normal things like play video games, bicycle, pick locks, and apparently blog.


So. That's it.

Yet Another First Post

This is actually the second, first post that I have posted here. Apparently, blogger differentiates between blog posts and pages?.


So, here goes again.


This is a first post, yata yata, basically I just want to see what my page looks like. More to come, cool tech things, dense info, some random stuff too.


-ronman.

News Updates: MTE

 For anyone who reads that is actually tracking this issue, this won't be news, but I haven't seen that much buzz about it online an...