Saturday, May 22, 2021

Security Concepts: Introduction

I'm sure there are lots of places that have this information, in fact, I'm 100% sure because I'm not quite smart enough to have come up with any of this early enough to have my name on papers. I read this somewhere, you could surely read it in the same places. I am writing this, as will be my norm, to condense a set of knowledge (that I think might be helpful to someone) in a singular place.


So, who the heck is meant to consume this information? First off, whoever wants to read it. Second, the target audience is a relatively technical group, not unfamiliar with programming concepts...

Ok, really math concepts, but also, we're not really talking math... The types of things I'm talking about are functions, numeric bases other than 10 (specifically base 2), boolean logic/algebra, the idea that everything in math/computers is a number. That's basically it. If you generally understand that, skip this next little bit (to the poop). If you're a little fuzzy on one or two of those, this next section is for you...


What's a function?

A function is a process that takes a specific set of inputs to yield a specific set of outputs, generated with respect to the given inputs. If you start to think about it, functions are everywhere. Fast food? What you order decides what they give you. Fast food has a functional interface. Vending machine? Press buttons as input, get snacks. Functional interface. Important thing to take away is that the output of a function is defined by the inputs.


What are "numeric bases other than 10"? (And why base 2?)

MOST. IMPORTANT. THING. Numbers don't have bases. REAL numbers, the actual concept of 4 can be written in many ways, but it still represents the same thing. You and I can picture 4 items in our heads and be confident that we're talking about the same number of individual things. REPRESENTATIONS of numbers have bases. Essentially, the number of symbols used to define the set representable numbers within an order of magnitude is the base of the representation. We're, as humans, most familiar with "base 10". It's called 'base 10' because we actually use 10 symbols: 1, 2, 3, 4, 5, 6, 7, 8, 9, AND 0. (Weird side note, the base of a number system is written in base-10. Because we're humans.) So. The representation of each number is thought to have an infinite number of leading 0's (the smallest symbol representing a lack of anything) When counting, add 1 to the smallest end of the number (the right). If that digit is already at the max, add 1 to the next digit and set the lowest digit to 0. Repeat this same logic as you go up, if the second digit is already the max, set it to 0 and add 1 to the third digit, and so on.

With this info, you can now count in any base you want.

Base 10: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...)

Base 7: (1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15, 16, 20, ...)

What about base 1? (1, 11, 111, 1111, 11111, 111111, 1111111, ...)

*Base 1 is a little weird, if you move with the idea that there are 0's preceding the significant digits, base 1 doesn't really work, but whatever.

Base 2:..

Wait, I said that base 2 was important right? Base 2, also known as Binary, is important because in modern computers, transistors allow us to track the state of things as OFF or ON. 0, or 1... See where I'm going with this? I'll skip all the stuff involved in the 3-5 undergraduate classes from which I derive that idea and just give you the answer, all numbers on your computer are represented, at their core, operated on, saved, written, interpreted as base 2. 

Computers count like this: beep boop beep boop.

(Translated from computer:) 1, 10, 11, 100, ...

100(base2) is the same value as 4(base10).

Further note on this, most of the time, programmers et al don't use binary to represent numbers. We always know that they're being handled as binary under the hood, but it's not efficient to represent them in written text as such. Humans take a hybrid approach to efficiency and readability and use something called "hexadecimal" or base-16. Now, there are only 10 Arabic Numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) So how do we have 16? We cheat. F comes after E comes after D comes after C comes after B comes after A comes after 9. So the sequence of numbers in an order of magnitude are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, etc.


Boolean Logic/Algebra? Is this a math post?

No... But everything is actually math...

For real, boolean logic is the set of operations (or functions????) that can operate on bits (a 1 digit, binary number) or larger binary numbers. These functions are: NOT, AND, OR, XOR, NAND, NOR, XNOR.

NOT: Takes one bit and inverts it:

    (1) -> 0, (0) -> 1

AND: Takes 2 bits and outputs a 1 if both inputs are 1:

    (1, 1) -> 1, (otherwise) -> 0

OR: Takes 2 bits and outputs a 1 if either or both bits are 1, or, stated differently, outputs a 0 only if both bits are 0:

    (0, 0) -> 0, (otherwise) -> 1

XOR: Takes 2 bits and outputs a 1 if exactly 1 of the outputs is a 1:

    (1, 0) -> 1, (0, 1) -> 1, (otherwise) -> 0

NAND: AND, but the output is inverted with NOT

NOR: OR, but the output is inverted with NOT

XNOR: XOR, but the output is inverted with NOT


That's as mathy as these posts are likely to get right there. All that computers do can be boiled down to those operations.


"Everything is a number" sure sounds like math to me...

You got me there. But follow me for a minute. Remember, above, I talked about the most important thing being that REPRESENTATIONS of numbers are the things that have bases? Well, your computer has 0 clue what a cat is. It doesn't understand what socks are. Or what the alphabet is, for that matter. But it can act on the representations of those things. The representation of a picture of a cat boils down to a large set of numbers. Don't believe me? Well, if you lookup a picture of a cat, and look really closely, you'll find it's divided into little rectangles, pixels. Each pixel is colored by mixing 3 colors. The intensity of the mix of each color in a single pixel is given by a number. So each pixel is made of numbers, each picture is made of pixels, so each picture (by extension) is made of numbers. Same thing with pictures of socks; there's nothing special about cat pictures. Letters are a little different, there's a different style of representing those, but that letters are represented to the computer as numbers. Historically, that representation was usually something called ASCII, but now it's usually Unicode. (This isn't a post about text encoding so don't worry, I don't even know what ASCII stands for.)


If you've made it this far and feel like continuing, great. You're either nodding along or wishing you'd skipped to the poop before. Either way, you've made it. I'm proud of you.


💩💩💩💩💩💩💩💩💩💩💩💩💩💩


This post is meant to be the first in a series on cyber/information security. These posts will go through terms, concepts, primitives and make vague connections to real-world situations and algorithms. Heck, who knows, by the time I finish this series, quantum computing may have changed the cryptographic landscape enough where none of this matters anymore.


Being that this, while the introduction to the series, is actually the first post, I'll define some terms that will keep coming up as we go.

Privacy

At the core of basically all cyber/information security is the idea of privacy. Privacy is the concept that something can be done without someone else observing it, either because no one is looking or because, even if someone was looking, they would not be able to perceive what was happening.

You hear a lot about privacy in talks about end-to-end encryption, private messages online, browsing history/web requests, bank transactions, etc. Generally, privacy is very important, imagine if you weren't free to conduct a banking transaction in private. Then all the bad guys would have your password, and before you knew it, they'd have all your money too.

Security

Security, given the definition of privacy is both easier and harder to explain. In short, security is privacy, but concretely provable. Security is the enforcement of privacy internally. Privacy is what we want, security is privacy with extra steps, and the rest of these things provide the provability required for security.

Encryption

Encryption is the process of perturbing some data set such that the original data cannot be easily derived from the perturbed data. The original data, henceforth, will be called "plaintext". Further, the perturbed data will be called "ciphertext". Thus says the king.

An encryption algorithm takes at least two things, plaintext data and a key. The algorithm will output some other data as ciphertext. The important thing here is that one cannot derive the plaintext or key from the ciphertext. This means that you could write a murder confession and encrypt it, and publish the encrypted version in the Daily Bugle (or the Daily Planet if you're more a DC person) and NOT get caught. *So long as you also publish neither the plaintext nor the key by mistake. That would be bad.

That on it's own is not super useful though. An important feature of encryption is that it is reversible. This means that given some algorithm, a key, and a ciphertext, the original plaintext can be recovered.

So how's this useful? Imagine you have a way to derive a key, some function... a... Key Derivation Function? That will convert a text password to an encryption key... a Password Based Key Derivation Function?... Now, you use a password to derive a key, encryption some file with that key. Then later, you don't have to remember the key, just the password used to generate the key. Enter the password, derive the key again, and decrypt the file. *This is, by the way, how password based file encryption works. **This is an example of encrypting data at rest.

Now, imagine you and a friend have previously agreed on some key. When/how you agreed on that key is not important now, what is important is that you may encrypt a message to your friend and send the ciphertext. Then, because they have the key, the may decrypt the message to get the plaintext. All the while, your mortal enemies on the internet (probably people from the youtube comments) can't read what you're sending.

Authentication

Authentication relates to the origination of some set of information. The information in question could be a data stream from a website or a file that was emailed to you, or just an email. How do you know that it is actually from whoever it says it's from?

Authentication is the process by which you can be sure that, at some point, private information known only to a specific entity was used to sign off on some piece of information. It does not guarantee that the information was generated by that entity, just that their private info was used to generate a signature. In fact, much like your secretary (who has those anymore?) could draft a note for you to sign, so can signing authorities be used to rubber-stamp signatures. This isn't a bad thing though. Authentication is really important because if you don't know who you're talking to, it would be too easy to give out the password to your bank online. If you're using a modern web browser, there's likely a small lock icon in the URL bar next to the URL for this blog, that shows that the webpage you're viewing is both encrypted and authenticated.

Anonymous

Anonymity is a big deal recently. The idea is that you can do something online and not have your identity revealed. Think of anonymity the anti-authentication, where authentication is a provable way for you to say that a certain entity had contact with certain information, anonymity is plausible deniability.

In the modern world, anonymity is really hard to achieve properly, I might make a post speculating, war-gaming how one might achieve this goal, but there's not really a lot of provability because it's just a lot harder.

Validation/Data Integrity

This is the idea that you can take some large amount of information and somehow determine if it was changed. Usually, this is done with message digests (hashes, CRCs, checksums) or forward error correction (black magic). Data integrity is important in two general scenarios. The first is the simple case: You don't have a perfect data link. That is to say that by the nature of your connection, some data may change during transmission through no malicious means. Think of radio waves or loose wires. Happens all the time. It's good to know when your data is been corrupted. The second case is that someone has maliciously perturbed your data to have some effect. If someone is changing the destination of your rent payment before it gets to the bank, you definitely want the bank to be able to tell that it's been re-routed, this is where data integrity checks AKA validation come in.

*Random note here, if there's a random sentence here and there that end in semicolons -- ';' -- I'm sorry. I just caught myself doing that. This is the most I've typed outside of semicolon terminated text files in a long time.


Characters

What's life without a little fun?

In the world of security, we use examples and hypotheticals a lot. This means that it gets boring reading "Party A sends Party F another message with the pre-master secret derived from the previous message from Party E" type things. Instead, we use names.

The most common names are Alice and Bob. These are generic, friendly people in the examples. They're the ones who are trying to communicate securely. Eve is likely the third most common (Eve being short for Eavesdropper).

There are lots of names. Go over to wikipedia and check out /wiki/Alice_and_Bob for more information.


*Note how I didn't put a link in there? I try to avoid links. you can copy/paste that '/wiki/Alice_and_Bob' after typing 'https://www.wikipedia.com' and it will take you to the right place. The important thing is that there's hardly any way here that I could guide you to a malicious webside. I highly advise you do that too, if you want to share a video in an email/text message, just grab the video ID out of the URL, look after 'watch?v='. Whatever comes after that can be searched in youtube. The first result will be the video you're trying to share, don't believe me? Try searching this on youtube: dQw4w9WgXcQ


And that just about wraps it up. I hope I didn't bore you to sleep, or at least, if I did, that's why you came here in the first place. If you found this interesting enough to make it this far, stick around, or if you're from the future, read on; there's more to come...

No comments:

Post a Comment

News Updates: MTE

 For anyone who reads that is actually tracking this issue, this won't be news, but I haven't seen that much buzz about it online an...