This is a transcript from the video mentioned above: (please support the original creator by liking and subscribing to their channel!)
Hello, this is dr. Greg Godwin, Ferris State University. And today we’re going to go through database design. The first thing we want to cover is what is a database well. A database is an organized collection of data. The key is that it is organized without the organization. The data may not be of use. So when you talk about a database, we’re talking about organized data. Now there are several different types of databases or database management systems, relational databases like Oracle, Microsoft sequel server and DB2 data warehousing databases, no SQL databases, There’s many subtypes of no SQL databases. Mongodb is a very popular one right now. There’s a lot of document databases and graphic databases that are no SQL databases, object databases, and then there’s many other kinds such as heretical and some of the older structures now for our purposes were interested in relational databases. Relational databases are transaction databases where the relationships are defined by primary and foreign keys for the most part. Now, when we talk about transactional databases, those would be things where you’re doing work such as purchasing something where there’s a lot of updating, inserting deleting of records, things like that that the typical manufacturing databases or a product database or maybe a database that supports a cash register or something like that now. The data in a relational database is contained in tables and they look a lot like an Excel spreadsheet now. Excel is not a database, but one table in a relational database looks kind of like a spreadsheet in Excel. So I’m going to go through a few different database design lines. First, I’ll go through a poor database design, and then we’ll go through one that is much more ready for the real world. If you will now this tool that I have is Microsoft Visio and it is often used for database design, but it’s basically used for a lot of different things, including making electrical diagrams and things like that, so it’s a very much a multi-purpose piece of software we’re going to start out with the customer table. There’s three tables in this database, customer order and product now in the customer table. The primary key is the customer. I’d now a primary key is what identifies a record in a table. It is not an index and indexes for speed. That’s a different structure. Now it is common that keys and indexes are operating on the same field, but they’re not the same structure now. I have these little green boxes that talk about some of the challenges with this database design, and there’s really three normal forms or three rules that you often go by to correctly design a database. You might read a database textbook and it talks about the five normal forms or seven or nine or something like that. The reality is, is that if you go to three normal forms, you’re probably in pretty good shape, and I was a lead DBA in a multi-billion dollar organization for a number of years and I never saw a situation that I can think of where three normal forms didn’t handle every situation, so in the customer table, let’s go through some of the challenges. First thing customer name customer name is a violation of first normal form in the first normal form is there is no repeating groups in the data. Now you say well, what’s a repeating group? Well, in the customer name field first name and last name would both end up being put into that field. Is it First-name Lastname? Is it going to be last name comma first name? What if there’s a middle name? What if there’s two last names, a hyphenated name, all different variations and you can see that trying to put all of that into one field is going to cause some problems so first normal form no repeating groups, we want the data to be atomic or broke down into the individual structures, another similar situation That is another kind of repeating group is where you see this phone number. One phone number two. The reason for that is to allow one or two phone numbers in the customer table, so you might have a home phone a cell phone, a work phone alternate phone, something like that, however, that’s really not the best way to handle it. Sometimes it’s a compromise in the design, but still, it’s something that raises a little bit of concern another problem in the customer table, right below that the last field is called employer name now. Employer name is not directly related to the primary key. The primary key is customer. Id, and that’s all the customer information. The employer name belongs in an employer table. Now the employer name might be the name of the employer that the person works for, but again it does not belong in the customer table, so that’s a violation of the second normal form, where the data doesn’t depend on the key. If we go over to the order and product tables, you might notice the product name and it’s in dark bold is in both the order table in the product table. That is a violation of the third normal form, which is no redundant. Data don’t want the same data in more than one location, it takes up extra space and also its possibility of not keeping it in sync. You might update the product name and one of the tables and forget to do it in another table, so you don’t want to have product name in both of those tables? If we further focus on the order table, you can see that it is not directly tied to the customer. There’s no customer. Id, and that was would be how you would define the relationship. Be the primary key would want to put in a customer ID into the order table in some fashion to tie the customer to the order as this currently stands. You have no idea who placed the order now. If you look at the order table, you’ll see a product ID. And that ties the product in order together. However, you can only order one product in a given order. So sometimes if you put some data into the tables, it makes it easier to see that so visually, Let’s say it’s order number 123 and we ordered product 16 Well, once you put 16 into the product. I’d in that order table where you’re going to put product. I’d 17 you. You can’t without creating another order, so that’s a problem with our current arrangement, and this is just a poor design, so let’s move on to something. That’s a little bit more solid and this is a normalized database structure will again start with the customer table and in the customer table. You can see with the kind of red colored boxes next to the tables. Some of the changes. Well, first of all we made named atomic. We put in a last name and a first name and that eliminates the challenge of is it going to be last name comma first name, first name last name that type of thing in the previous design. We also removed the employer name. If for some reason we wanted to by the customer to an employer, we could put an employer. I’d into the customer table and that would link us to the customers to the employer database table. Now, phone number. We split out and we have the customer. ID, area code and phone number is all a combination for a primary key. We took phone number and broke area code out of it, so we made that even more atomic now customer. I’d, it depends on the way you’re going to use the application. The customer, I’d can be part of the primary key or it can be just another field that defines the relationship as a foreign key into the customer. Table depends on how you’re going to use it and a lot of times with design. You’ve got to ask questions and what are the advantages and disadvantages? This way very much blocks a particular phone number to a customer so could eliminate problems down the road with perhaps it will at least flag duplicate phone numbers tied to a different customers and things like that. Let’s move over to the order table and the product’s table. You can see the order table. I took the product. I’d completely out of the order table. Now we want to be able to order many products in one order, and we also want to be able to have the same product be in many orders. So in other words, one order can have many products in a product can be in many orders is a many-to-many a common way to allow. This is to put a table in between which I call a bridge table, and I usually name that the same as the two tables that it’s between, so I use order and products and make that the order products table and the key is the key of the two tables it’s between and when you have more than one field as part of the primary key, that is what we call a compound, a composite key and the bridge table, which is the order products, results the many-to-many and allows you to have one order with many products and one product and go into many orders. So let’s go back and let’s say it’s order Number 123 Well, if we go to the order products table, we put in order 123 and then the product. I’d is 16 and then if we want to put in the next product, we can go order 123 and the product. I’d is 117 or something like that. And that combination allows us to have many products into that particular order. One other thing that we did is we put the customer. I’d as a foreign key in the order table and that ties the order to a particular customer, so we have a much stronger design than what we had previously, so we utilize the three normal forms. Let’s go into sequel server and take just a little bit deeper. Look and see what else we can find what I did In sequel server is to find that customer table and you can see customer. I’d has a little yellow icon next to it. That’s the primary key, and when we define fields, we use what we call data types as to what kind of data goes into it, you know, Numeric data, alphanumeric data, numbers, dates and things like that. Well, in this particular instance, the customer. I’d is an integer. An integer is a range of values, then on which version of the DBMS you’re using. It can be anything from, say negative two billion to positive two billion or something like that. It’s not a fixed number of digits or a fixed number of positions. You look to the third column there where it says allow nulls and there are check boxes. Well, if you check the box that means it would allow nulls below a null is no value. If you have a record in a table, you certainly need the key. So the key is required. Well, we do not want to have allow null in a key field because that would mean you could enter a record without having a key and then you have orphan data and dirty data and things like that, so we always require the value for the key and usually good normalized databases require every field be filled in, however, that’s more of a textbook thing than a practical use corporate thing, because as you maintain databases, you might add fields and there might be historical data that you cannot go back and recapture to fill in so a lot of times. It’s a more of a go forward where you start capturing that data. If you look at the first name, we have a var car 20 That’s a variable character up to 20 so anything up to 20 characters will fit in there. If you try to put in 21 a character is going to be truncated. Last name is 30 same thing anything up to 30 if we go down to zip zip has a end car of 10 and that means it’s a Unicode Unicode is the double white character set and a double byte character set is something that will allow large numbers of alphabetical characters, so for example, in Asia, You have alphabets that have over 50,000 different symbols. You could use this database if it had an N car, but you can’t if it’s just a regular char because it’s a single bite versus a double bite. You don’t have as much space to put that in now. Last thing to point out the way I have. These names like customer ID. Customer is a Capital C and Id capital or capital. F for first. And that is an abbreviation capital. N for name, those are what we call double bite or excuse me. Camel case, zip. The reality is, it’s probably better to go zip code or depending on how your Bre, Va Shions goes. Zip code Camel Case is where you capitalize the first character of each word in the name. So that is most of what? I wanted to talk about one last thing. Let’s look at what the data is like in a database table. We have five records or five rows of data. You might hear database textbook, call it. A couple referred to a row as a tupple in all my years in corporate. I never heard any programmer or database administrator. Really talk about tuples. Everybody calls them rows or records. If you look at everything in the first name column, all five values, we call that a column, Okay, then the first name column of data. And if you’re looking at one particular cell, that’s typically a via field, so you got a record or a row. You have columns of data and you have a field of data, so I hope that gives you some insight into database design and we will look at some other things.