SQL
Is the standard query language for relational databases.
This differs from the classic relational algebra of a relational database by
returning lists instead of sets (meaning the uniqueness of a fetched record
isn't guaranteed by the engine).
It also moves away from the classic operations described in relational algebra,
using SELECT
to perform a PROJECT
operation and WHERE
to perform a SELECT
operation.
Syntax
The actual syntax of SQL can vary from one dialect to another however the core constructs remain the same in most cases.
General Semantics
SQL is structured of queries each consisting of keywords in some orders ending
with a ;
. Comment are written Haskell style with --
and can be placed anywhere
on a line. You can also declare multi-line comments like JavaScript with /* */
.
Strings are surrounding by single quotes (example 'hello'
) whereas back-ticks are
used to quote SQL keywords such as table names with spaces (example
`University Students`
).
Create a Table
Is done through the CREATE TABLE
command. The body of the definition has us place
any columns of the table including any primary key or foreign key references.
CREATE TABLE BookTitle(
isbn INT(13) NOT NULL PRIMARY KEY,
title VARCHAR(30),
author VARCHAR(20)
);
CREATE TABLE BookCopy(
deweyId CHAR(30) NOT NULL PRIMARY KEY,
onLoan BOOL DEFAULT FALSE,
isbn INTEGER(13),
FOREIGN KEY(isbn) REFERENCES BookTitle
);
Deferred Key Declarations
If desired you can mark any primary or foreign keys after defining all the columns.
CREATE TABLE Loan (
memberId INT(10) NOT NULL,
copyId CHAR(30) NOT NULL,
issueDate DATE,
dueDate DATE,
PRIMARY KEY (memberId, copyId),
FOREIGN KEY (memberId) REFERENCES Borrower,
FOREIGN KEY (copyId) REFERENCES BookCopy(deweyId)
);
Templating from Existing Tables
You can build an SQL table by collecting records from a select query. This automatically infers the column types as well.
CREATE TABLE Foo AS
SELECT bar, baz, bag
FROM Bam
WHERE 1=1;
Or alternatively using the SELECT INTO
syntax.
SELECT *
INTO newTable [IN externaldb]
FROM oldTable
WHERE condition;
Field Types
Each column in an SQL database is required to have a name and a data-type.
Data Type | Description |
---|---|
CHAR(size) | A fixed length string with maximum size SIZE (defaulting to 1). |
VARCHAR(size) | Same as CHAR(size) except we don't store all of SIZE, only the amount used. |
BINARY(size) | Same as CHAR(size) but for binary byte strings instead of apha-numeric. |
VARBINARY(size) | Like VARCHAR for BINARY(size). |
TINYBLOB | For *B*inary *L*arge *OB*jects with max length 255 bytes. |
TINYTEXT | Holds a string with maximum length 255 characters. |
TEXT(size) | Holds a string with maximum length 65,535 bytes. |
BLOB(size) | For BLOBs holding up to 65,535 bytes of data. |
MEDIUMTEXT | Holds a string with maximum length 16,777,215 characters. |
MEDIUMBLOB | For BLOBs holding up to 16,777,215 bytes of data. |
LONGTEXT | Holds a string with maximum length 4,294,967,295 characters. |
LONGBLOB | For BLOBs holding up to 4,294,967,295 bytes of data. |
ENUM(val1,val2,...) | A string chosen from a user-specified list of (max 65535) possible values. |
SET(val1,val2,...) | A set of (0 or more) strings chosen from a list of (max 64) possible values. |
BIT(size) | A bit-value type storing up to 64-bits (defaulting to 1). |
TINYINT(size) | An integer ranging signed from and unsigned from [0, 255]. |
BOOL | A basic boolean with 0 being false and non-zero being positive. |
BOOLEAN | Equivalent to BOOL. |
SMALLINT(size) | An integer ranging signed from and unsigned from [0, 65535]. |
MEDIUMINT(size) | An integer ranging signed from and unsigned from [0 to 16777215]. |
INT(size) | An integer ranging signed from and unsigned from [0, 4294967295]. |
INTEGER(size) | Same as INT(size). |
BIGINT(size) | An integer ranging signed from and unsigned from [0, 18446744073709551615]. |
FLOAT(size, d) | A floating point number with SIZE being the total number of digits and D being digits after the decimal point. |
DOUBLE(size, d) | A normal sized floating point number. |
DECIMAL(size, d) | An exact fixed-point number taking the same form as FLOAT(size, d). |
DEC(size, d) | Same as DECIMAL(size, d). |
DATE | A date formatted as YYYY-MM-DD ranging from [1000-01-01, 9999-12-31]. |
DATETIME(fsp) | Date and time formatted as YYYY-MM-DD HH:MM:SS. |
TIMESTAMP(fsp) | A variant of DATETIME stored as the number of seconds since the epoch. |
TIME(fsp) | A time formatted as hh:mm:ss. |
YEAR | A four digit year value ranging from. |
Field Constraints
The NOT NULL
or PRIMARY KEY
suffixes in the prior CREATE TABLE
example is an SQL
constraint. These modify the treatment of the term by the DBMS.
Constraint | Description |
---|---|
NOT NULL | Prevents insertion of null for this column. |
UNIQUE | Guarantees no two records will have the smae value for this column. |
PRIMARY KEY | A combination of NOT NULL and UNIQUE . Uniqely identifies a record. |
FOREIGN KEY | References a field in another table (prevents destruction of links). |
CHECK | Ensures that the value in a column satisfies a specific condition. |
DEFAULT | Sets a default value for a column if no value is specified. |
CREATE INDEX | Generate an index on this column. |
AUTO_INCREMENT | Denotes that this integer column should be incremeneted for each record. |
The CHECK
constraint accepts a parameter that evaluates like a WHERE
condition.
If it evaluates to false an attempt to insert that record is rejected by the engine.
CREATE TABLE Persons (
id INT NOT NULL,
age INT,
CHECK (age >= 18)
);
Modify Table Fields
You can alter a table after creating it including adding or removing fields.
ALTER TABLE Customers
ADD Email VARCHAR(255);
ALTER TABLE Persons
DROP COLUMN username;
In some cases you may need to modify the data-type of a column.
This commonly uses the ALTER COLUMN
syntax but depending on the dialect it could
instead use MODIFY COLUMN
or just COLUMN
.
ALTER TABLE persons
ALTER COLUMN dateOfBirth YEAR;
Altering Constraints
You can change the DEFAULT
constraint of a column or drop it altogether.
ALTER TABLE Persons
ALTER COLUMN city SET DEFAULT 'Sandness';
ALTER TABLE Persons
ALTER COLUMN city DROP DEFAULT;
Delete a Table
Removes every record in a given table and then removes the table itself.
DROP TABLE TableName;
Truncate a Table
Removes every record from a table, but doesn't delete the table itself.
Truncate TABLE TableName;
Create a Database
Some DBMS supports multiple disjoint databases. In these you may have to create and join a database before creating tables.
CREATE DATABASE DatabaseName;
Delete a Database
Drops a database including all the tables in the database.
DROP DATABASE DatabaseName;
Column Indexes
Indexes are references to records based on their value in some column. This speeds up lookups based on those columns immensely. This comes at the obvious cost of larger record sizes and slower updates/insertions.
CREATE INDEX indexName
ON tableName(column1, column2, ...);
If you need to prevent duplicates in your database you can use CREATE UNIQUE
INDEX
instead.
If you later need to remove an index you can use DROP INDEX
;
DROP INDEX indexName ON tableName;
Create Views
A view is a virtual table defined as the result of an SQL query.
CREATE VIEW viewName AS
SELECT column1, column2, ...
FROM tableName
WHERE condition;
Such views once created can be interacted with like regular tables. You can
further filter them down using a SELECT
WHERE
query.
For example we can create a view to fetch all products above the average price.
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);
And then select these on demand as required.
SELECT * FROM [Products Above Average Price];
Querying Records
The SELECT
keyword is used to fetch records from a given table.
SELECT * FROM tableName;
If necessary you can select a subset of columns from the query set.
SELECT column1, column2, column3 FROM tableName;
SQL unlike relational algebra operates on the levels of lists, not sets. This
means the uniqueness of the result set of a query isn't guaranteed. To work
around this you qualify select with DISTINCT
to ensure the output is unique.
SELECT UNIQUE productCount FROM products;
If you have repeat product counts in the regular select query then this reduces it
to only the unique entries. [1, 2, 2, 2, 1]
becomes just [1, 2]
.
Select can also nest into sub-queries.
-- Select the number of countries from which we
-- have at least one customers.
SELECT Count(*) AS DistinctCountries
FROM (SELECT DISTINCT Country FROM Customers);
Filter Conditions
The examples shown before returned all the records (or a unique subset of them)
but generally you'll want to return records with some properties. This where the
WHERE
clause comes into affect.
SELECT * FROM Customers
WHERE country='Mexico';
Operator | Description |
---|---|
X + Y | Addition |
X - Y | Subtraction |
X * Y | Multiplication |
X / Y | Division |
X % Y | Modulo |
X & Y | Bitwise AND |
X \vert Y | Bitwise OR |
X ^ Y | Bitwise Exclusive OR |
Operator | Description |
---|---|
X = Y | Equality |
X > Y | Greater than |
X < Y | Less than |
X >= Y | Greater than or equal |
X <= Y | Less than or equal |
X <> Y (or != ) | Not equal |
BETWEEN X AND Y | Within a range |
X LIKE Y | Glob X using |
X IN (y1, y2) | Collection membership |
X IN (SUBQUERY) | Query membership |
X AND Y | Boolean And |
X OR Y | Boolean Or |
NOT X | Boolean Negation |
IS NULL | Null comparator |
IS NOT NULL | Null comparator negation |
EXISTS (SUBQUERY) | True if sub-query yields at least one record |
ANY(SUBQUERY) | True if any sub-query value meets the condition |
ALL(SUBQUERY) | True if all sub-query values meets the condition |
ANY and ALL are a special sort of comparator that allows you to compare a single
column value to a range of other values. This distinguishes it from EXISTS
because it can support more operators than just membership.
Consider the query
SELECT * FROM Table1 t1
WHERE t1.id < ANY(SELECT id FROM Table2);
The ANY
means the condition will be satisfied if the operation (<
) is true for at
least one value in the sub-query. That is we compare the column t1.id
with a
collection of other Ids in a single query.
Similarly ALL
will be satisfied if the operation is true for all the values in
the sub-query. For example we can use this to select the customers placed orders
that are larger than the average of each customer order.
SELECT DISTINCT FirstName + ' ' + LastName AS 'Customer'
FROM Customer, [Order]
WHERE Customer.Id = [Order].CustomerId
AND TotalAmount > ALL
(SELECT AVG(TotalAmount)
FROM [ORDER]
GROUP BY CustomerId)
Symbol | Description | Example |
---|---|---|
% | Zero or more characters | bl% matches bl, black, blue and blob |
_ | Represnts a single character | h_t matches hot, hat and hit |
[] | Represents any character in the bracket | h[oa]t match hot and hat |
^ | Reject any characters in the bracket | h[^oa]t matches hit, not hot and hat |
- | Represents a range of characters | c[a-b]t matches cat and bat |
Case Statement
CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 WHEN conditionN THEN resultN ELSE result END;
Aggregate Filters
The
WHERE
clause shown above suffers from one flaw. You cannot use it with aggregate functions meaning you can't filter based on summaries. To get around this we use theHAVING
clause that works just like theWHERE
except also supporting aggregations.For example the following query selects countries and the number of customers from those countries where the number of customers is at least greater than 5.
SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country HAVING COUNT(CustomerID) > 5;
Re-ordering Results
You can re-order the results of a SELECT
query using the ORDER BY
clause.
You can specify whether to sort by ascending or descending, defaulting to
ascending when neither is given.
SELECT column1, column2, ...
FROM tableName
ORDER BY column1, column2, ... ASC|DESC;
When specifying multiple columns the first takes priority over the other.
Limiting Record Counts
Beyond filtering we can also specify a maximum number of records to fetch. This differs depending on the DBMS but can either come before the column names or after the end of the query.
SELECT TOP number|percent columns FROM tableName WHERE condition;
-- or
SELECT columns FROM tableName WHERE condition LIMIT amount;
Result Aggregation
SQL provides mathematical operators you can use to aggregate or reduce result sets. This includes, but is not limited to, minimum, maximum, average and count.
SELECT MIN(columnName)
FROM tableName
WHERE condition;
Operator | Description |
---|---|
MIN(column) | Select the smallest value from column. |
MAX(column) | Select the largest value from column. |
COUNT(column) | Count the number of rows in the query. |
AVG(column) | Average the values of a numeric column. |
SUM(column) | Accumulate numbers in the given column. |
Grouping Rows for Aggregation
The aggregate operations shown in the previous section operate across the whole result set. In some cases it may be more appropriate to group records based on the value of another column and then aggregate these groups instead.
For example the following counts customers by country. The result set shows the country in one column followed by the number of customers from that country in the next.
SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country;
Aliases
Sometimes a column name may be too long or not detailed enough for a query.
SELECT columnName AS aliasName
FROM tableName;
We can now reference columnName
as aliasName
in any WHERE
expressions.
In some cases, such as joining multiple tables, we may have to reference columns in each table explicitly. You can alias table names to make this much simpler.
SELECT bt.`name`, bt.author, bc.deweyId
FROM BookTitle bt, BookCopy bc
WHERE bt.isbn = bc.isbn;
You can also alias a sub-query as a table in its own right and then select from it.
SELECT b.forename, b.surname, t.`name`, m.dueDate
FROM Borrower b, BookCopy c, BookTitle t,
(SELECT * FROM Loan WHERE dueDate < `2018-09-20`) AS od
WHERE od.memberId = b.memberId
AND od.copyId = c.deweyId
AND c.isbn = t.isbn;
Ternary Operator
Some times you may want to do a mathematical operation like SELECT 5+column
but
this will crash if column is NULL
. Instead you can use the various null functions
to specify defaults for a column value in case it is null.
These all take the same form of IFNULL(column, default)
however the actual
function name varies based on implementation:
- IFNULL
- COALESCE
- ISNULL/IsNull
- NVL
Inserting New Records
The INSERT INTO
clause adds new records to a table.
INSERT INTO tableName(column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
The column list after the table name is optional and if omitted the value list
should be in the same order as the columns were defined in the CREATE TABLE
clause.
Copying Records from Another Table
INSERT INTO table2
SELECT * FROM table1
WHERE condition;
Modifying Records
The UPDATE
clause allows you to update the values of one or more records in a
table in place.
UPDATE tableName
SET column1 = value1, column2 = value2, ...
WHERE condition;
Deleting Records
The DELETE FROM
clause removes records from a table.
DELETE FROM tableName WHERE condition;
Joins
Allow you to combine records from two or more tables based on relations between them. There exists joins for each described in relational algebra section.
Inner Join
SELECT columnName
FROM table1
INNER JOIN table2
ON table1.columnName = table2.columnName;
-- or
SELECT columnName FROM table1, table2
WHERE table1.columnName = table2.columnName;
Natural Join
Is a variant of inner-join where columns that match (the
ON
clause) in the two tables are reduced to a single column instead of appearing as two columns separate in the output table.SELECT * FROM BookTitle NATURAL JOIN BookCopy; -- or SELECT * FROM BookTitle, BookCopy USING (isbn);
Left Join
SELECT columnName
FROM table1
LEFT JOIN table2
ON table1.columnName = table2.columnName;
Right Join
SELECT columnName
FROM table1
RIGHT JOIN table2
ON table1.columnName = table2.columnName;
Full Join
This may also be referred to as cross join.
SELECT columnName
FROM table1
FULL OUTER JOIN table2
ON table1.columnName = table2.columnName
WHERE condition;
Self Join
Is the same as a regular join operation but the table is joined onto itself.
SELECT columnName
FROM table1 T1, table1 T2
WHERE condition;
Union
Union is a combination operation used to stack two result sets, one atop the other. This requires the columns from both select queries to have similar types and the same count in the same order.
SELECT columnName FROM table1
UNION
SELECT columnName FROM table2;
By default the union operation selects only distinct values, to avoid any repeat
records in both SELECT
statements appearing twice in the overall result. To
bypass this you have to specify UNION ALL
.
SELECT columnName FROM table1
UNION ALL
SELECT columnName FROM table2;
Stored Procedures
Offer a way to store some SQL code and then call it over and over again.
This is done in two parts. First be declaring the procedure:
CREATE PROCEDURE SelectAllCustomers
AS
SELECT * FROM Customers
GO;
And then by calling/invoking/executing it:
EXEC SelectAllCustomers;
Transactions
SQL by itself is a dangerous language. Each query could irreprably alter the database and if part of multiple modifications fails you'd want to be able to rollback or reverse to before any query was run. For situations like these you have transactions.
BEGIN TRANSACTION transName; -- Start the transaction
-- Declare the queries to be done by the transaction.
-- These aren't persisted to the db until you commit.
UPDATE accounts SET balance = balance - 1000 WHERE account_no = 100;
UPDATE accounts SET balance = balance + 1000 WHERE account_no = 200;
COMMIT; -- Save changes to the database at once.
ROLLBACK; -- Undo the commit and return to the original state.
Roles and Permissions
Some SQL engines support the concept of users with certain [see page 19, privileges] in regards to reading or writing the database. This takes the classical role-based access control implementation.
-- Let anyone select from the borrower table on MyDB
GRANT SELECT ON MyDB.Borrower TO PUBLIC;
-- Create a new user with a given password
CREATE USER 'user1'@'localhost'
IDENTIFIED BY 'passwd1';
-- Allow user1 to insert, delete, select and update
GRANT ALL ON MyDB.Loan TO 'user1'@'localhost';
-- Allow user1 to select and update the following columns
GRANT SELECT, UPDATE(issueDate, dueDate)
ON MyDB.Loan TO 'user1'@'localhost';
-- Grant user1 permissions and the ability to pass those
-- permissions to other users
GRANT updater ON MyDB.BookTitle TO
'user1'@'localhost' WITH GRANT OPTION;
-- Create two new roles
CREATE ROLE 'SuperUser', 'User';
-- Declare permissions for members of each role
GRANT ALL ON MyDB.* TO 'SuperUser';
GRANT SELECT ON MyDB.* TO 'User';
-- Grant or removes roles from specific users
GRANT 'SuperUser' TO 'user1'@'localhost';
GRANT 'User' TO 'user2'@'localhost';
GRANT 'User' TO 'user3'@'localhost';
REVOKE 'User' FROM 'user2'@'localhost';
Query Optimisation
There are several concerns when [see page 16, performing] a query:
- How many rows are there to find (time)?
- How difficult is it to find them (number of joins)?
- Where to store the intermediate results (space)?
Possible approaches to this involves structuring queries using sub-queries, filtering large datasets first before joining and projecting columns before joining. Most DBMS' store index databases using B+ trees.