class: middle # .eight[CSET 155:] ## .eight[Database Design] --- class: middle # Joins --- class: middle # Agenda 1. [ ] [Basic Syntax](#syntax) 2. [ ] [Key Columns](#keys) 3. [ ] [Types of Joins](#joins) 4. [ ] [Using Joins in other Queries](#other) --- # Reading Data ```sql SELECT * FROM table_name; ``` Get all fields for all records from **one** table. --- name: syntax # Basic Syntax JOIN ... ON ```sql SELECT * FROM table_a JOIN table_b ON table_a.key_column = table_b.foreign_key_column; ``` Links one table to another by matching values contained in certain columns. --- # Basic Syntax Search left first, then right: ```sql FROM left_table JOIN right_table ``` What are these *keys*? ```sql ON key_column = foreign_key_column ``` --- # Basic Syntax Multiple tables means you have to specify which table a column belongs in: ```sql table_a.column_name -- versus table_b.column_name ``` --- class: middle, center # [Download the Code](https://raw.githubusercontent.com/anthonydb/practical-sql/master/Chapter_06/Chapter_06.sql) Look at Listing 6-1 --- name: keys # Key Columns New SQL keywords: - .eight[CONSTRAINT] - .eight[PRIMARY KEY] - .eight[UNIQUE] - .eight[REFERENCES] --- # Primary Key Column or collection of columns whose values uniquely identify each row in a table. - **Must** be unique - **Can't** be missing --- # Foreign Key Reference to a primary key in another table that must already exist. - **Can** be empty - **Can** have duplicates in a table --- # Constraints - Rules you create that your table must follow - Each rule has a name - Define primary and foreign keys - Learn more types later --- # Constraints Departments: ```sql CONSTRAINT dept_key PRIMARY KEY (dept_id) ``` Employees: ```sql dept_id integer REFERENCES departments (dept_id), CONSTRAINT emp_key PRIMARY KEY (emp_id) ``` --- # Unique - Not a key, but is a constraint - Guarantees unique values or combinations of values ```sql CONSTRAINT dept_city_unique UNIQUE (dept, city) CONSTRAINT emp_dept_unique UNIQUE (emp_id, dept_id) ``` --- # Run The Query ```sql SELECT * FROM employees JOIN departments ON employees.dept_id = departments.dept_id; ``` We've joined department data to the employee data on matching department IDs. --- name: joins # Types of Joins - Inner Join - Left Join - Right Join - Full Outer Join - Cross Join --- # Joins Defined By Example ```sql CREATE TABLE schools_left ( id integer CONSTRAINT left_id_key PRIMARY KEY, left_school varchar(30) ); CREATE TABLE schools_right ( id integer CONSTRAINT right_id_key PRIMARY KEY, right_school varchar(30) ); ``` Insert the data, see Listing 6-3. --- # Inner Join Get rows from both tables only on matches. ```sql SELECT * FROM schools_left JOIN schools_right ON schools_left.id = schools_right.id; ``` Also .eight[INNER JOIN] --- # Left Join All rows from the left table plus matches from right. ```sql SELECT * FROM schools_left LEFT JOIN schools_right ON schools_left.id = schools_right.id; ``` Unmatched rows on right are empty. --- # Right Join All rows from the right table plus matches from left. ```sql SELECT * FROM schools_left RIGHT JOIN schools_right ON schools_left.id = schools_right.id; ``` Unmatched rows on left are empty. --- # Full Outer Join All rows from both tables, matches aligned. ```sql SELECT * FROM schools_left FULL OUTER JOIN schools_right ON schools_left.id = schools_right.id; ``` Unmatched rows are empty on both sides. --- # Cross Join Every possible combination of rows from both tables. ```sql SELECT * FROM schools_left CROSS JOIN schools_right; ``` Cross Join doesn't care about matches, so it doesn't use the .eight[ON] keyword. .eleven[Don't use Cross Join on large tables!!!] --- # Which Join? - Inner: most common; see matching data using well-structured database - Left/Right: you want to see missing data; you know some won't match - Full Outer: less common; visualize how two tables will merge - Cross: rare; see all combinations --- name: other # Filtering By NULL ```sql SELECT * FROM schools_left LEFT JOIN schools_right ON schools_left.id = schools_right.id WHERE schools_right.id IS NULL; ``` Or the opposite using .eight[IS NOT NULL] --- # Specifying Columns Try this: ```sql SELECT id FROM schools_left LEFT JOIN schools_right ON schools_left.id = schools_right.id; ``` Which .eight[id] are you going to get? --- # Specifying Columns ```sql SELECT schools_left.id, schools_left.left_school, schools_right.right_school FROM schools_left LEFT JOIN schools_right ON schools_left.id = schools_right.id; ``` When there's ambiguity, prepend the column name with the table name: --- # Simplify With Aliases ```sql SELECT lt.id, lt.left_school, rt.right_school FROM schools_left AS lt LEFT JOIN schools_right AS rt ON lt.id = rt.id; ``` The .eight[AS] keyword creates an alias you can use within the query. --- # Joining Multiple Tables Let's add two more tables to the mix: - enrollment for each school - grade levels taught in each school Look at Listing 6-12 to create the tables and insert the data. --- # Joining Multiple Tables ```sql SELECT lt.id, lt.left_school, en.enrollment, gr.grades FROM schools_left AS lt LEFT JOIN schools_enrollment AS en ON lt.id = en.id LEFT JOIN schools_grades AS gr ON lt.id = gr.id; ``` --- # .fourteen[Exercises] Look at Listing 6-13: - Create a second US Counties table with census data from 2000 - Import data using the [CSV file here](https://raw.githubusercontent.com/anthonydb/practical-sql/master/Chapter_06/us_counties_2000.csv) - Run the SELECT query. .fourteen[What is it doing?] --- # .fourteen[Exercises] Which county had the greatest percentage loss of population between 2000 and 2010? *Bonus points if you can figure out why.* --- # .fourteen[Exercises] The 2010 census data has 3,143 rows and the 2000 census data has 3,141. Using our knowledge of Joins and filtering by NULL, find out which counties don't exist in both tables. For the curious, look each of them up online and see why they don't exist anymore, or why they were created. --- # .fourteen[Exercises] Go back to the Math lecture slides and look at the median problem. Find the median of the percent change in county population from 2000 to 2010.