Joins
This intends to provide a brief summary of JOIN operations. I’m attempting to follow standard SQL here where possible, thus avoiding, say, Oracle terminology like ANTIJOIN and SEMIJOIN. All SQL databases that I’ve encountered perform LEFT/RIGHT/INNER JOIN as expected, but others will show some variation. When in doubt, refer to your documentation.
The generic JOIN procedure uses the following syntax:
SELECT column1, column2 FROM table1 JOIN table2 ON table1.column3=table2.column3 ORDER BY table1.column1
table1 is considered the “left” table, while table2 is the “right” table. Think of them as two pages in a book, side by side. There will be some similarities in wording between them, and these can be the basis for connecting the rows on the pages.
LEFT JOIN returns all the rows from the left table, even if there’s no match in the right table. In this case, nonmatched entries in the right table are still present in the result set but given null values as placeholders. This is synonymous with LEFT OUTER JOIN.
RIGHT JOIN returns all the rows from the right table, even if there’s no match in the left table. In this case, nonmatched entries in the left table are still present in the result set but given null values as placeholders. This is synonymous with RIGHT OUTER JOIN.
INNER JOIN returns rows from both tables only when there’s a match between them. Any records with null values in either table are excluded from the result set. Using the JOIN keyword by itself is equivalent to INNER JOIN, as is joining two tables using a WHERE table1.column1=table2.column1 clause.
FULL JOIN is the opposite of INNER JOIN: it gives you all the rows from either table with the corresponding null values.
OUTER JOIN is equivalent to FULL OUTER JOIN. This gives you all rows in both the left and right tables, with null values present in both.
CROSS JOIN gives you a result set consisting of each row in the left table combined with every row in the right table — a Cartesian product. If you limit CROSS JOIN with the WHERE clause, it becomes equivalent to an INNER JOIN because the result set will exclude all non-matching ordered pairs.
NATURAL JOIN is similar to INNER JOIN but won’t duplicate columns with identical names. SELECT * FROM table1 INNER JOIN table2 ON table1.record_id=table2.record_id; prints the record_id column twice, but SELECT * FROM table1 NATURAL JOIN table2; gives you record_id only once.
STRAIGHT JOIN, specific to MySQL, behaves similarly to INNER JOIN but forces MySQL to read the left table first in the event the MySQL query optimizer is placing the tables in the wrong order. Generally this isn’t necessary or recommended, and STRAIGHT JOIN is only listed here because I’ve needed it a couple of times.
Joining Multiple Tables
To see a table of customers, contacts, contact names, order dates, and products from the sample databases at the W3Schools SQL Tester, you can do something like this:
SELECT Customers.CustomerID, CustomerName, ContactName, Orders.OrderID, OrderDate, ProductName, Unit FROM Customers INNER JOIN Orders ON Customers.CustomerID=Orders.CustomerID INNER JOIN Shippers ON Orders.ShipperID=Shippers.ShipperID INNER JOIN OrderDetails ON Orders.OrderID=OrderDetails.OrderID INNER JOIN Products ON Products.ProductID=OrderDetails.ProductID ORDER BY Customers.CustomerID
(Remember, INNER JOIN excludes any null values, which won’t always be what you want.)
Various Tricks and Examples
Most of the time I’d suggest manipulating query results with Python or another general-purpose programming language, but here are some silly SQL tricks. All apply to MySQL unless otherwise indicated. It always pays to check your RDBMS documentation as there may be a function that does exactly what you want.
Show records with date fields ranging from today to two weeks into the future:
SELECT e.event, e.location, e.startdate FROM events e WHERE e.startdate BETWEEN CURDATE() and DATE_ADD(CURDATE(), INTERVAL 13 DAY) ORDER BY e.startdate;
Show the difference in days between two dates:
SELECT DATEDIFF(event1_date, event2_date) FROM ( SELECT date AS event1_date FROM events WHERE event_name = 'Blobfish Appreciation Committee' ) x, ( SELECT date as event2_date FROM events WHERE event_name = 'Swing Dancing Lessons' ) y
Ranking results from smallest to largest based on an integer field:
SELECT (SELECT COUNT(DISTINCT t2.numbers) FROM table2 t2 WHERE t2.numbers <= t1.numbers) AS rank, t1.numbers FROM table1 t1
Select the min and max integer values from a column:
SELECT patient, MIN(age) AS min_age, MAX(age) AS max_age FROM patients GROUP BY patient
Show the running total of an integer field:
SELECT e.event_name, e.event_attendance, (SELECT SUM(f.event_attendance) FROM events f WHERE f.event_id <= e.event_id) AS running_total FROM events e
Skip every even row in a result set (show 1st, 3rd, 5th, etc.):
SELECT x.event_name FROM ( SELECT e.event_name, (SELECT COUNT(*) FROM events f WHERE f.event_name <= e.event_name) AS ename FROM events e ) x WHERE MOD(x.ename, 2) = 1
Extract the initials from a name:
SELECT CASE WHEN name_count = 2 THEN TRIM(TRAILING '.' FROM CONCAT_WS('.', SUBSTR(SUBSTRING_INDEX(name, ' ', 1), 1, 1), SUBSTR(name, LENGTH(SUBSTRING_INDEX(name, ' ', 1))+2, 1), SUBSTR(SUBSTRING_INDEX(name, ' ', -1), 1, 1), '.')) ELSE TRIM(TRAILING '.' FROM CONCAT+WS('.', SUBSTR(SUBSTR_INDEX(name, ' ', 1), 1, 1), SUBSTR(SUBSTR_INDEX(name, ' ', -1), 1, 1) )) END AS initials FROM ( SELECT name, LENGTH(name)-LENGTH(REPLACE(name,' ' , '')) AS name_count FROM ( SELECT REPLACE('Michael Schumacher', '.' ,'') AS name FROM table1 )y )x
Find the average (mean) or median of a list of figures:
-- average/mean SELECT name, AVG(age) AS avg_age FROM patients -- median SELECT AVG(age) FROM ( SELECT p.age FROM patients p, patients q WHERE p.name = q.name AND p.name LIKE '%Lohan' GROUP BY p.age HAVING SUM( CASE WHEN p.age = q.age THEN 1 ELSE 0 END) >= ABS(SUM(SIGN(p.age - q.age))) )
Create a horizontal histogram, i.e., a sideways bar graph:
SELECT department, LPAD('*', COUNT(*), '*') AS patient_count FROM patients GROUP BY department