Brain Dump

SQL

Tags
language soft-eng

Is the standard query language for relational databases.

This differs from the classic relational algebra of a relational database by returning lists instead of sets (meaning the uniqueness of a fetched record isn't guaranteed by the engine). It also moves away from the classic operations described in relational algebra, using SELECT to perform a PROJECT operation and WHERE to perform a SELECT operation.

Syntax

The actual syntax of SQL can vary from one dialect to another however the core constructs remain the same in most cases.

General Semantics

SQL is structured of queries each consisting of keywords in some orders ending with a ;. Comment are written Haskell style with -- and can be placed anywhere on a line. You can also declare multi-line comments like JavaScript with /* */. Strings are surrounding by single quotes (example 'hello') whereas back-ticks are used to quote SQL keywords such as table names with spaces (example `University Students`).

Create a Table

Is done through the CREATE TABLE command. The body of the definition has us place any columns of the table including any primary key or foreign key references.

CREATE TABLE BookTitle(
  isbn INT(13) NOT NULL PRIMARY KEY,
  title VARCHAR(30),
  author VARCHAR(20)
);

CREATE TABLE BookCopy(
  deweyId CHAR(30) NOT NULL PRIMARY KEY,
  onLoan BOOL DEFAULT FALSE,
  isbn INTEGER(13),
  FOREIGN KEY(isbn) REFERENCES BookTitle
);

Deferred Key Declarations

If desired you can mark any primary or foreign keys after defining all the columns.

CREATE TABLE Loan (
  memberId INT(10) NOT NULL,
  copyId CHAR(30) NOT NULL,
  issueDate DATE,
  dueDate DATE,
  PRIMARY KEY (memberId, copyId),
  FOREIGN KEY (memberId) REFERENCES Borrower,
  FOREIGN KEY (copyId) REFERENCES BookCopy(deweyId)
);

Templating from Existing Tables

You can build an SQL table by collecting records from a select query. This automatically infers the column types as well.

CREATE TABLE Foo AS
SELECT bar, baz, bag
FROM Bam
WHERE 1=1;

Or alternatively using the SELECT INTO syntax.

SELECT *
INTO newTable [IN externaldb]
FROM oldTable
WHERE condition;

Field Types

Each column in an SQL database is required to have a name and a data-type.

Table 1: Listing of various datatypes in SQL.
Data TypeDescription
CHAR(size)A fixed length string with maximum size SIZE (defaulting to 1).
VARCHAR(size)Same as CHAR(size) except we don't store all of SIZE, only the amount used.
BINARY(size)Same as CHAR(size) but for binary byte strings instead of apha-numeric.
VARBINARY(size)Like VARCHAR for BINARY(size).
TINYBLOBFor *B*inary *L*arge *OB*jects with max length 255 bytes.
TINYTEXTHolds a string with maximum length 255 characters.
TEXT(size)Holds a string with maximum length 65,535 bytes.
BLOB(size)For BLOBs holding up to 65,535 bytes of data.
MEDIUMTEXTHolds a string with maximum length 16,777,215 characters.
MEDIUMBLOBFor BLOBs holding up to 16,777,215 bytes of data.
LONGTEXTHolds a string with maximum length 4,294,967,295 characters.
LONGBLOBFor BLOBs holding up to 4,294,967,295 bytes of data.
ENUM(val1,val2,...)A string chosen from a user-specified list of (max 65535) possible values.
SET(val1,val2,...)A set of (0 or more) strings chosen from a list of (max 64) possible values.
BIT(size)A bit-value type storing up to 64-bits (defaulting to 1).
TINYINT(size)An integer ranging signed from and unsigned from [0, 255].
BOOLA basic boolean with 0 being false and non-zero being positive.
BOOLEANEquivalent to BOOL.
SMALLINT(size)An integer ranging signed from and unsigned from [0, 65535].
MEDIUMINT(size)An integer ranging signed from and unsigned from [0 to 16777215].
INT(size)An integer ranging signed from and unsigned from [0, 4294967295].
INTEGER(size)Same as INT(size).
BIGINT(size)An integer ranging signed from and unsigned from [0, 18446744073709551615].
FLOAT(size, d)A floating point number with SIZE being the total number of digits and D being digits after the decimal point.
DOUBLE(size, d)A normal sized floating point number.
DECIMAL(size, d)An exact fixed-point number taking the same form as FLOAT(size, d).
DEC(size, d)Same as DECIMAL(size, d).
DATEA date formatted as YYYY-MM-DD ranging from [1000-01-01, 9999-12-31].
DATETIME(fsp)Date and time formatted as YYYY-MM-DD HH:MM:SS.
TIMESTAMP(fsp)A variant of DATETIME stored as the number of seconds since the epoch.
TIME(fsp)A time formatted as hh:mm:ss.
YEARA four digit year value ranging from.

Field Constraints

The NOT NULL or PRIMARY KEY suffixes in the prior CREATE TABLE example is an SQL constraint. These modify the treatment of the term by the DBMS.

Table 2: List of some common column level SQL constraints.
ConstraintDescription
NOT NULLPrevents insertion of null for this column.
UNIQUEGuarantees no two records will have the smae value for this column.
PRIMARY KEYA combination of NOT NULL and UNIQUE. Uniqely identifies a record.
FOREIGN KEYReferences a field in another table (prevents destruction of links).
CHECKEnsures that the value in a column satisfies a specific condition.
DEFAULTSets a default value for a column if no value is specified.
CREATE INDEXGenerate an index on this column.
AUTO_INCREMENTDenotes that this integer column should be incremeneted for each record.

The CHECK constraint accepts a parameter that evaluates like a WHERE condition. If it evaluates to false an attempt to insert that record is rejected by the engine.

CREATE TABLE Persons (
  id INT NOT NULL,
  age INT,
  CHECK (age >= 18)
);

Modify Table Fields

You can alter a table after creating it including adding or removing fields.

ALTER TABLE Customers
ADD Email VARCHAR(255);
ALTER TABLE Persons
DROP COLUMN username;

In some cases you may need to modify the data-type of a column. This commonly uses the ALTER COLUMN syntax but depending on the dialect it could instead use MODIFY COLUMN or just COLUMN.

ALTER TABLE persons
ALTER COLUMN dateOfBirth YEAR;

Altering Constraints

You can change the DEFAULT constraint of a column or drop it altogether.

ALTER TABLE Persons
ALTER COLUMN city SET DEFAULT 'Sandness';

ALTER TABLE Persons
ALTER COLUMN city DROP DEFAULT;

Delete a Table

Removes every record in a given table and then removes the table itself.

DROP TABLE TableName;

Truncate a Table

Removes every record from a table, but doesn't delete the table itself.

Truncate TABLE TableName;

Create a Database

Some DBMS supports multiple disjoint databases. In these you may have to create and join a database before creating tables.

CREATE DATABASE DatabaseName;

Delete a Database

Drops a database including all the tables in the database.

DROP DATABASE DatabaseName;

Column Indexes

Indexes are references to records based on their value in some column. This speeds up lookups based on those columns immensely. This comes at the obvious cost of larger record sizes and slower updates/insertions.

CREATE INDEX indexName
ON tableName(column1, column2, ...);

If you need to prevent duplicates in your database you can use CREATE UNIQUE INDEX instead.

If you later need to remove an index you can use DROP INDEX;

DROP INDEX indexName ON tableName;

Create Views

A view is a virtual table defined as the result of an SQL query.

CREATE VIEW viewName AS
SELECT column1, column2, ...
FROM tableName
WHERE condition;

Such views once created can be interacted with like regular tables. You can further filter them down using a SELECT WHERE query. For example we can create a view to fetch all products above the average price.

CREATE VIEW [Products Above Average Price] AS
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);

And then select these on demand as required.

SELECT * FROM [Products Above Average Price];

Querying Records

The SELECT keyword is used to fetch records from a given table.

SELECT * FROM tableName;

If necessary you can select a subset of columns from the query set.

SELECT column1, column2, column3 FROM tableName;

SQL unlike relational algebra operates on the levels of lists, not sets. This means the uniqueness of the result set of a query isn't guaranteed. To work around this you qualify select with DISTINCT to ensure the output is unique.

SELECT UNIQUE productCount FROM products;

If you have repeat product counts in the regular select query then this reduces it to only the unique entries. [1, 2, 2, 2, 1] becomes just [1, 2].

Select can also nest into sub-queries.

-- Select the number of countries from which we
-- have at least one customers.
SELECT Count(*) AS DistinctCountries
FROM (SELECT DISTINCT Country FROM Customers);

Filter Conditions

The examples shown before returned all the records (or a unique subset of them) but generally you'll want to return records with some properties. This where the WHERE clause comes into affect.

SELECT * FROM Customers
WHERE country='Mexico';
Table 3: Availble arithmatic and bitwise operators
OperatorDescription
X + YAddition
X - YSubtraction
X * YMultiplication
X / YDivision
X % YModulo
X & YBitwise AND
X \vert YBitwise OR
X ^ YBitwise Exclusive OR
Table 4: List of basic SQL boolean operators.
OperatorDescription
X = YEquality
X > YGreater than
X < YLess than
X >= YGreater than or equal
X <= YLess than or equal
X <> Y (or !=)Not equal
BETWEEN X AND YWithin a range
X LIKE YGlob X using
X IN (y1, y2)Collection membership
X IN (SUBQUERY)Query membership
X AND YBoolean And
X OR YBoolean Or
NOT XBoolean Negation
IS NULLNull comparator
IS NOT NULLNull comparator negation
EXISTS (SUBQUERY)True if sub-query yields at least one record
ANY(SUBQUERY)True if any sub-query value meets the condition
ALL(SUBQUERY)True if all sub-query values meets the condition

ANY and ALL are a special sort of comparator that allows you to compare a single column value to a range of other values. This distinguishes it from EXISTS because it can support more operators than just membership. Consider the query

SELECT * FROM Table1 t1
WHERE t1.id < ANY(SELECT id FROM Table2);

The ANY means the condition will be satisfied if the operation (<) is true for at least one value in the sub-query. That is we compare the column t1.id with a collection of other Ids in a single query. Similarly ALL will be satisfied if the operation is true for all the values in the sub-query. For example we can use this to select the customers placed orders that are larger than the average of each customer order.

SELECT DISTINCT FirstName + ' ' + LastName AS 'Customer'
  FROM Customer, [Order]
WHERE Customer.Id = [Order].CustomerId
  AND TotalAmount > ALL
      (SELECT AVG(TotalAmount)
       FROM [ORDER]
       GROUP BY CustomerId)
Table 5: Listing of the various symbol semantics in SQL wildcards.
SymbolDescriptionExample
%Zero or more charactersbl% matches bl, black, blue and blob
_Represnts a single characterh_t matches hot, hat and hit
[]Represents any character in the bracketh[oa]t match hot and hat
^Reject any characters in the bracketh[^oa]t matches hit, not hot and hat
-Represents a range of charactersc[a-b]t matches cat and bat
  • Case Statement

    CASE
      WHEN condition1 THEN result1
      WHEN condition2 THEN result2
      WHEN conditionN THEN resultN
      ELSE result
    END;
  • Aggregate Filters

    The WHERE clause shown above suffers from one flaw. You cannot use it with aggregate functions meaning you can't filter based on summaries. To get around this we use the HAVING clause that works just like the WHERE except also supporting aggregations.

    For example the following query selects countries and the number of customers from those countries where the number of customers is at least greater than 5.

    SELECT COUNT(CustomerID), Country
    FROM Customers
    GROUP BY Country
    HAVING COUNT(CustomerID) > 5;

Re-ordering Results

You can re-order the results of a SELECT query using the ORDER BY clause. You can specify whether to sort by ascending or descending, defaulting to ascending when neither is given.

SELECT column1, column2, ...
FROM tableName
ORDER BY column1, column2, ... ASC|DESC;

When specifying multiple columns the first takes priority over the other.

Limiting Record Counts

Beyond filtering we can also specify a maximum number of records to fetch. This differs depending on the DBMS but can either come before the column names or after the end of the query.

SELECT TOP number|percent columns FROM tableName WHERE condition;
-- or
SELECT columns FROM tableName WHERE condition LIMIT amount;

Result Aggregation

SQL provides mathematical operators you can use to aggregate or reduce result sets. This includes, but is not limited to, minimum, maximum, average and count.

SELECT MIN(columnName)
FROM tableName
WHERE condition;
OperatorDescription
MIN(column)Select the smallest value from column.
MAX(column)Select the largest value from column.
COUNT(column)Count the number of rows in the query.
AVG(column)Average the values of a numeric column.
SUM(column)Accumulate numbers in the given column.
  • Grouping Rows for Aggregation

    The aggregate operations shown in the previous section operate across the whole result set. In some cases it may be more appropriate to group records based on the value of another column and then aggregate these groups instead.

    For example the following counts customers by country. The result set shows the country in one column followed by the number of customers from that country in the next.

    SELECT COUNT(CustomerID), Country
    FROM Customers
    GROUP BY Country;

Aliases

Sometimes a column name may be too long or not detailed enough for a query.

SELECT columnName AS aliasName
FROM tableName;

We can now reference columnName as aliasName in any WHERE expressions.

In some cases, such as joining multiple tables, we may have to reference columns in each table explicitly. You can alias table names to make this much simpler.

SELECT bt.`name`, bt.author, bc.deweyId
FROM BookTitle bt, BookCopy bc
WHERE bt.isbn = bc.isbn;

You can also alias a sub-query as a table in its own right and then select from it.

SELECT b.forename, b.surname, t.`name`, m.dueDate
FROM Borrower b, BookCopy c, BookTitle t,
     (SELECT * FROM Loan WHERE dueDate < `2018-09-20`) AS od
WHERE od.memberId = b.memberId
  AND od.copyId = c.deweyId
  AND c.isbn = t.isbn;

Ternary Operator

Some times you may want to do a mathematical operation like SELECT 5+column but this will crash if column is NULL. Instead you can use the various null functions to specify defaults for a column value in case it is null.

These all take the same form of IFNULL(column, default) however the actual function name varies based on implementation:

  • IFNULL
  • COALESCE
  • ISNULL/IsNull
  • NVL

Inserting New Records

The INSERT INTO clause adds new records to a table.

INSERT INTO tableName(column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);

The column list after the table name is optional and if omitted the value list should be in the same order as the columns were defined in the CREATE TABLE clause.

Copying Records from Another Table

INSERT INTO table2
SELECT * FROM table1
WHERE condition;

Modifying Records

The UPDATE clause allows you to update the values of one or more records in a table in place.

UPDATE tableName
SET column1 = value1, column2 = value2, ...
WHERE condition;

Deleting Records

The DELETE FROM clause removes records from a table.

DELETE FROM tableName WHERE condition;

Joins

Allow you to combine records from two or more tables based on relations between them. There exists joins for each described in relational algebra section.

Inner Join

SELECT columnName
FROM table1
INNER JOIN table2
ON table1.columnName = table2.columnName;
-- or
SELECT columnName FROM table1, table2
WHERE table1.columnName = table2.columnName;
  • Natural Join

    Is a variant of inner-join where columns that match (the ON clause) in the two tables are reduced to a single column instead of appearing as two columns separate in the output table.

    SELECT * FROM BookTitle NATURAL JOIN BookCopy;
    -- or
    SELECT * FROM BookTitle, BookCopy USING (isbn);

Left Join

SELECT columnName
FROM table1
LEFT JOIN table2
ON table1.columnName = table2.columnName;

Right Join

SELECT columnName
FROM table1
RIGHT JOIN table2
ON table1.columnName = table2.columnName;

Full Join

This may also be referred to as cross join.

SELECT columnName
FROM table1
FULL OUTER JOIN table2
ON table1.columnName = table2.columnName
WHERE condition;

Self Join

Is the same as a regular join operation but the table is joined onto itself.

SELECT columnName
FROM table1 T1, table1 T2
WHERE condition;

Union

Union is a combination operation used to stack two result sets, one atop the other. This requires the columns from both select queries to have similar types and the same count in the same order.

SELECT columnName FROM table1
UNION
SELECT columnName FROM table2;

By default the union operation selects only distinct values, to avoid any repeat records in both SELECT statements appearing twice in the overall result. To bypass this you have to specify UNION ALL.

SELECT columnName FROM table1
UNION ALL
SELECT columnName FROM table2;

Stored Procedures

Offer a way to store some SQL code and then call it over and over again.

This is done in two parts. First be declaring the procedure:

CREATE PROCEDURE SelectAllCustomers
AS
SELECT * FROM Customers
GO;

And then by calling/invoking/executing it:

EXEC SelectAllCustomers;

Transactions

SQL by itself is a dangerous language. Each query could irreprably alter the database and if part of multiple modifications fails you'd want to be able to rollback or reverse to before any query was run. For situations like these you have transactions.

BEGIN TRANSACTION transName; -- Start the transaction

-- Declare the queries to be done by the transaction.
-- These aren't persisted to the db until you commit.
UPDATE accounts SET balance = balance - 1000 WHERE account_no = 100;
UPDATE accounts SET balance = balance + 1000 WHERE account_no = 200;

COMMIT; -- Save changes to the database at once.

ROLLBACK; -- Undo the commit and return to the original state.

Roles and Permissions

Some SQL engines support the concept of users with certain [see page 19, privileges] in regards to reading or writing the database. This takes the classical role-based access control implementation.

-- Let anyone select from the borrower table on MyDB
GRANT SELECT ON MyDB.Borrower TO PUBLIC;

-- Create a new user with a given password
CREATE USER 'user1'@'localhost'
IDENTIFIED BY 'passwd1';

-- Allow user1 to insert, delete, select and update
GRANT ALL ON MyDB.Loan TO 'user1'@'localhost';

-- Allow user1 to select and update the following columns
GRANT SELECT, UPDATE(issueDate, dueDate)
ON MyDB.Loan TO 'user1'@'localhost';

-- Grant user1 permissions and the ability to pass those
-- permissions to other users
GRANT updater ON MyDB.BookTitle TO
'user1'@'localhost' WITH GRANT OPTION;

-- Create two new roles
CREATE ROLE 'SuperUser', 'User';

-- Declare permissions for members of each role
GRANT ALL ON MyDB.* TO 'SuperUser';
GRANT SELECT ON MyDB.* TO 'User';

-- Grant or removes roles from specific users
GRANT 'SuperUser' TO 'user1'@'localhost';
GRANT 'User' TO 'user2'@'localhost';
GRANT 'User' TO 'user3'@'localhost';

REVOKE 'User' FROM 'user2'@'localhost';

Query Optimisation

There are several concerns when [see page 16, performing] a query:

  • How many rows are there to find (time)?
  • How difficult is it to find them (number of joins)?
  • Where to store the intermediate results (space)?

Possible approaches to this involves structuring queries using sub-queries, filtering large datasets first before joining and projecting columns before joining. Most DBMS' store index databases using B+ trees.

Links to this note