Stupid SQL Tricks

Joins

This intends to provide a brief summary of JOIN operations. I’m attempting to follow standard SQL here where possible, thus avoiding, say, Oracle terminology like ANTIJOIN and SEMIJOIN. All SQL databases that I’ve encountered perform LEFT/RIGHT/INNER JOIN as expected, but others will show some variation. When in doubt, refer to your documentation.

The generic JOIN procedure uses the following syntax:

SELECT column1, column2
FROM table1
JOIN table2
ON table1.column3=table2.column3
ORDER BY table1.column1

table1 is considered the “left” table, while table2 is the “right” table. Think of them as two pages in a book, side by side. There will be some similarities in wording between them, and these can be the basis for connecting the rows on the pages.

LEFT JOIN returns all the rows from the left table, even if there’s no match in the right table. In this case, nonmatched entries in the right table are still present in the result set but given null values as placeholders. This is synonymous with LEFT OUTER JOIN.

RIGHT JOIN returns all the rows from the right table, even if there’s no match in the left table. In this case, nonmatched entries in the left table are still present in the result set but given null values as placeholders. This is synonymous with RIGHT OUTER JOIN.

INNER JOIN returns rows from both tables only when there’s a match between them. Any records with null values in either table are excluded from the result set. Using the JOIN keyword by itself is equivalent to INNER JOIN, as is joining two tables using a WHERE table1.column1=table2.column1 clause.

FULL JOIN is the opposite of INNER JOIN: it gives you all the rows from either table with the corresponding null values.

OUTER JOIN is equivalent to FULL OUTER JOIN. This gives you all rows in both the left and right tables, with null values present in both.

CROSS JOIN gives you a result set consisting of each row in the left table combined with every row in the right table — a Cartesian product. If you limit CROSS JOIN with the WHERE clause, it becomes equivalent to an INNER JOIN because the result set will exclude all non-matching ordered pairs.

NATURAL JOIN is similar to INNER JOIN but won’t duplicate columns with identical names. SELECT * FROM table1 INNER JOIN table2 ON table1.record_id=table2.record_id; prints the record_id column twice, but SELECT * FROM table1 NATURAL JOIN table2; gives you record_id only once.

STRAIGHT JOIN, specific to MySQL, behaves similarly to INNER JOIN but forces MySQL to read the left table first in the event the MySQL query optimizer is placing the tables in the wrong order. Generally this isn’t necessary or recommended, and STRAIGHT JOIN is only listed here because I’ve needed it a couple of times.

Joining Multiple Tables

To see a table of customers, contacts, contact names, order dates, and products from the sample databases at the W3Schools SQL Tester, you can do something like this:

SELECT Customers.CustomerID, CustomerName, ContactName, Orders.OrderID, OrderDate, ProductName, Unit
FROM Customers
INNER JOIN Orders ON Customers.CustomerID=Orders.CustomerID
INNER JOIN Shippers ON Orders.ShipperID=Shippers.ShipperID
INNER JOIN OrderDetails ON Orders.OrderID=OrderDetails.OrderID
INNER JOIN Products ON Products.ProductID=OrderDetails.ProductID
ORDER BY Customers.CustomerID

(Remember, INNER JOIN excludes any null values, which won’t always be what you want.)

Various Tricks and Examples

Most of the time I’d suggest manipulating query results with Python or another general-purpose programming language, but here are some silly SQL tricks. All apply to MySQL unless otherwise indicated. It always pays to check your RDBMS documentation as there may be a function that does exactly what you want.

Show records with date fields ranging from today to two weeks into the future:

SELECT e.event, e.location, e.startdate
FROM events e
WHERE e.startdate BETWEEN CURDATE() and DATE_ADD(CURDATE(), INTERVAL 13 DAY)
ORDER BY e.startdate;

Show the difference in days between two dates:

SELECT DATEDIFF(event1_date, event2_date)
FROM (
   SELECT date AS event1_date
   FROM events
   WHERE event_name = 'Blobfish Appreciation Committee'
   ) x,
   (
   SELECT date as event2_date
   FROM events
   WHERE event_name = 'Swing Dancing Lessons'
   ) y

Ranking results from smallest to largest based on an integer field:

SELECT (SELECT COUNT(DISTINCT t2.numbers)
    FROM table2 t2
    WHERE t2.numbers <= t1.numbers) AS rank,
       t1.numbers
FROM table1 t1

Select the min and max integer values from a column:

SELECT patient, MIN(age) AS min_age, MAX(age) AS max_age
FROM patients
GROUP BY patient

Show the running total of an integer field:

SELECT e.event_name, e.event_attendance,
    (SELECT SUM(f.event_attendance) FROM events f
     WHERE f.event_id <= e.event_id) AS running_total
FROM events e

Skip every even row in a result set (show 1st, 3rd, 5th, etc.):

SELECT x.event_name
FROM (
     SELECT e.event_name,
       (SELECT COUNT(*)
           FROM events f
           WHERE f.event_name <= e.event_name) AS ename
     FROM events e
           ) x
WHERE MOD(x.ename, 2) = 1

Extract the initials from a name:

SELECT CASE
    WHEN name_count = 2 THEN
        TRIM(TRAILING '.' FROM
            CONCAT_WS('.',
              SUBSTR(SUBSTRING_INDEX(name, ' ', 1), 1, 1),
              SUBSTR(name, LENGTH(SUBSTRING_INDEX(name, ' ', 1))+2, 1),
              SUBSTR(SUBSTRING_INDEX(name, ' ', -1), 1, 1),
              '.'))
    ELSE
         TRIM(TRAILING '.' FROM
            CONCAT+WS('.',
              SUBSTR(SUBSTR_INDEX(name, ' ', 1), 1, 1),
              SUBSTR(SUBSTR_INDEX(name, ' ', -1), 1, 1)
              ))
    END AS initials
FROM (
   SELECT name, LENGTH(name)-LENGTH(REPLACE(name,' ' , '')) AS name_count
      FROM (
         SELECT REPLACE('Michael Schumacher', '.' ,'') AS name FROM table1
         )y
         )x

Find the average (mean) or median of a list of figures:

-- average/mean
SELECT name, AVG(age) AS avg_age
FROM patients 

-- median
SELECT AVG(age)
FROM (
   SELECT p.age
   FROM patients p, patients q
   WHERE p.name = q.name
      AND p.name LIKE '%Lohan'
   GROUP BY p.age
HAVING SUM(
   CASE WHEN p.age = q.age THEN 1 
   ELSE 0
   END) 
   >= ABS(SUM(SIGN(p.age - q.age)))
)

Create a horizontal histogram, i.e., a sideways bar graph:

SELECT department,
   LPAD('*', COUNT(*), '*') AS patient_count
FROM patients
GROUP BY department

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *