Structured Query Language (SQL)

 

 

 

SQL is scripting syntax for running database queries and performing Create, Read, Update, and Delete (CRUD) operations in a relational database system. Depending on the database management system (DBMS), different versions of SQL will have slight syntactical variations. However, foundational CRUD syntax will typically be common among DBMSs (e.g. SELECT, INSERT, UPDATE, DELETE, etc.).

 

Since I primarily work in Python, I use the SQLAlchemy library to leverage DBMS-agnostic queries and object-oriented programming. This allows me to query relational databases (on-prem and cloud database servers) data directly from my Python environment. I’ve written and published my own SQLAlchemy class code that executes queries that I frequently use, and I continually update this class as my query needs expand.

 

Here we will explore how SQL is used to build database architectures and navigate the data within it. We start by understanding how basic tables are constructed in a relational database, how SQL builds tables that comprise an architecture, and how SQL creates, reads, updates, and deletes data, among other operations.

 

Primary Key:  Primary keys in a database are records’ unique identifiers.  These values cannot be repeated across records within a given table (unique values), and are used to index records for efficient data retrieval.  When implemented properly in a relational database, primary keys are referenced in other tables as foreign keys.

 

Foreign Key:  Foreign keys in a database are reference values with which records in a data table are related to records in another data table.  These reference values are primary keys in their source tables that encapsulate all the data in the corresponding record. Note that it is not good practice for a table reference its own values.

 

Applying Keys to Database Tables

The table below shows records for an office supply company. It tracks their products and indicates their respective product managers. While this data is valuable to track, its database implementation is inefficient given the number of repeated records.

 

A screenshot of a computer screen

AI-generated content may be incorrect.

 

Implementing primary keys and foreign keys is a valuable strategy for designing efficient databases. The goal is to reduce the number of repeated records in any given table, thus reducing the memory required to store the database. A key benefit in reducing repeated records in a database is more efficient queries.

The orange table above can be broken out into two tables to eliminate the repeated product manager records. By creating a managers table comprised of unique manager records, the products table becomes less cluttered by eliminating 4 columns of repeated values. This is where primary keys and foreign keys become useful!

A screenshot of a computer

AI-generated content may be incorrect.

 

The blue table above shows records for product managers' contact infromation. The 'manager_id' column is defined as a primary key and cast as an integer type. The 'first_name', 'last_name', 'email_address', and 'phone_number' columns are cast as string types.

 

Note that the 'phone_number' column contains numeric values, but the dashes in each value make the records ineligible for numeric types. Alternatively, the dashes could be removed in extract, transform, load (ETL) and the records can then be stored as numeric types, specifically integer types.

 

A screen shot of a product

AI-generated content may be incorrect.

 

The green table above implements primary keys and foreign keys to detail a company's products. The 'product_id' column is defined as a primary key and cast as an integer type. The 'product_number', 'product_name', and 'unit' columns are cast as string types, and the 'product_unit_price' column is cast as a float type.

The 'product_manager' column is defined as a foreign key to the managers table primary keys and cast as and integer type, which relates each product to a manager. This also simplifies manager reassignments, as only the foreign key value needs to be updated to the reassigned manager. Furthermore, the manager table can be used in other tables throughout the databaase, keeping a consistent link between managers, products, and other tracked instances.

The foreign keys in the 'product_manager' column reference the primary keys in the managers table, relating the records in the products table to their product manager.

 

 



Once the database is architected and loaded with data, SQL is used to interact with the data. Interactions include, but are not limited to:

Simply put, SQL performs queries, which are action requests.



SQL Database Architecture Syntax



Databases are structured storage containers that store data. They are comprised of tables that are formatted into rows and columns, facilitating data retrieval and manipulation.

Each row is a data record, comprised of data distributed among its column. Each database column, or field, is assigned a label to identify the data stored in that table cell.

Rows are not inherently ordered, so they must be assigned an identifying value, known primary keys. Using row indices or record primary keys with column names, records can be accessed by identifying their row-column intersection in the table.

Below are some useful SQL operations for navigating and building a database architecture. Note that SQL syntax can vary across DBMSs, so red notes have been added where syntax variation is possible.

 

SHOW DATABASE

This query shows all existing databases

SHOW DATABASE



CREATE DATABASE

This query creates a new database by the name of <DATABASE_NAME> in a server.

CREATE DATABASE <DATABASE_NAME>

 

Attempting to create a database by the name of an existing database in the same server will yield an error, so adding the keyword 'IF NOT EXISTS' first checks if the database name already exists without raising an error.

CREATE DATABASE IF NOT EXISTS <DATABASE_NAME>

 

In both cases above, if a database creation by the name of an existing database is attempted , the attempt will not complete.



DROP DATABASE

This query deletes an existing database by the name of <DATABASE_NAME> in a server.

DROP DATABASE <DATABASE_NAME>

 

Attempting to delete a database that does not exist in a given server will yield an error, so adding the keyword 'IF NOT EXISTS' first checks if the database name exists without raising an error.

DROP DATABASE IF EXISTS <DATABASE_NAME>

 

USE DATABASE

Working with a database's tables requires first telling SQL which database to use by its <DATABASE_NAME>.

USE <DATABASE_NAME>



SHOW TABLES

This query shows all existing tables in a database

SHOW TABLES

 

 

EXPLAIN

This query examines <TABLE>'s column specifications. Note that this simply lists column names, details about the type of data it stores, and its restrictions - not the data itself.

EXPLAIN <TABLE>

 

 

CREATE TABLE

This query creates a new table by the name <TABLE>.

The 'IF NOT EXISTS' keyword is available but not required. This first checks if <TABLE> exists in the database before creating one without raising an error.

CREATE TABLE IF NOT EXISTS <TABLE>

(

<FIELD_NAME> <FIELD_DATA_TYPE> <FIELD_MODIFIER>,

<FIELD_NAME> <FIELD_DATA_TYPE> <FIELD_MODIFIER>,

<FIELD_NAME> <FIELD_DATA_TYPE> <FIELD_MODIFIER

)

 

 

FIELD DATA TYPES

Data types are used to define the type of data that is stored in a field. The field's data type is defined when the table is created, and all values within that column must adhere to the defined data type.

 

BOOL

Boolean value

TRUE

INT

Integer number

100

DECIMAL(<M>, <N>)

Decimal, or floating point, that specifies the number of permissible digits, where <M> is the total number of digits in the value and <N> is the number of digits after the decimal

DECIMAL(5, 2): 100.25

DOUBLE

Double-precision floating point number



DATE

Date formatted as YYYY-MM-DD

1970-01-01

TIME

Time formatted as HH:MM:SS

12:30:56

DATETIME

Combined date and time formatted as YYYY-MM-DD HH:MM:SS

1970-01-01 12:30:56

YEAR

Year formatted as YY or YYYY

1970

TIMESTAMP

Automatic date and time of record entry

1970-01-01 12:30:56

CHAR(<N>)

String of defined length up to 255 characters that is stored with padding

CHAR(15): 'HELLO WORLD_ _ _ _"

VARCHAR(<N>)

String of variable length up to 255 characters that is stored without padding

VARCHAR(15): 'HELLO WORLD"

TEXT

String of up to 65,535 characters

"HELLO WORLD"

BLOB

Binary type for variable data



ENUM(<VAL_0>, <VAL_1>)

Single string value from a defined list of multiple permitted values (only 1 value), up to thousands

ENUM('RED', 'YELLOW', 'BLUE') 'BLUE'

SET(<VAL_0>, <VAL_1>)

String of multiple strings from a defined list of multiple permitted values (one or more values), up to 64 options

ENUM('RED', 'YELLOW', 'BLUE') ('RED', 'BLUE')

ENUM() and SET() are mote involved data types that will be elaborated upon later.



FIELD MODIFIERS

Field modifiers define the behavior of a field. The field's modifier is defined when the table is created, and all values within that column adopt the defined modifier's behavior.

 

PRIMARY KEY

Indicates that the values in that field are primary keys for the records in that table

UNIQUE

Indicates that values cannot be repeated among records in that field

NOT NULL

Indicates that records must contain a value in that field

AUTO_INCREMENT

For numeric fields, automatically increments values by one from the previous record's value

DEFAULT

Indicates a value to use when no value is given for that field upon record insertion



ALTER TABLE

A <TABLE>'s format can be modified with the 'ALTER_TABLE' query. One or many modifications can be made to a table by creating a comma-separated list of intended alteration in the query.

ALTER TABLE <TABLE>

<ALTERATION_KEYWORD> <ALTERATION_SYNTAX>,

<ALTERATION_KEYWORD> <ALTERATION_SYNTAX>,

<ALTERATION_KEYWORD> <ALTERATION_SYNTAX>

 

ALTERATION KEYWORD

Definition

ALTERATION SYNTAX

ADD

Add a new column to an existing table

ALTER TABLE <TABLE>

ADD COLUMN <FIELD_NAME> <FIELD_DATA_TYPE> <FIELD_MODIFIER>

ADD PRIMARY KEY

Adds a primary key to an existing column

ALTER TABLE <TABLE>

ADD PRIMARY KEY <COL>

RENAME COLUMN

Renames the indicated existing column

ALTER TABLE <TABLE>

RENAME COLUMN <OLD_FIELD_NAME> TO <NEW_FIELD_NAME>

DROP COLUMN

Permanently deletes a column from a table

ALTER TABLE <TABLE>

DROP COLUMN <FIELD_NAME>

ALTER (or MODIFY)

Changes the field's data type

ALTER TABLE <TABLE>

ALTER COLUMN <FIELD_NAME> TYPE <NEW_DATA_TYPE>

*Syntax varies by DBMS

 

 

DROP TABLE

This query deletes a new table by the name <TABLE>.

The 'IF EXISTS' keyword is available but not required. This first checks if <TABLE> exists in the database before deleting it without raising an error.

DROP TABLE IF EXISTS <TABLE>

 

 

Table dependencies must be considered when deleting tables from a relational database. Tables that are depended on by other tables - in other words, tables that contain primary keys that other tables have foreign keys depend on - cannot be deleted. Doing so would break the dependency chain, which SQL constraints protect from.

To delete these tables, first delete tables that contain foreign keys but no depended-on primary keys. Iterate through this process until the desired table is deleted, but do not disable table constraints as that can compromise data integrity.

Alternatively, SQL offers a CASCADE option for the 'DROP TABLE' query, which deletes the table and its dependent objects.

DROP TABLE IF EXISTS <TABLE> CASCADE

 

 

SQL Data Query Syntax


COMPARISON OPERATORS

 

LOGICAL OPERATORS

 

COMMENTING QUERIES

Comments can be placed in a query by enclosing the comment text in /* ... */. Any text enclosed within this syntax will be ignored when executing a query.

/* THIS IS SQL COMMENT SYNTAX */

 

 

SELECT

The 'SELECT' statement selects data for retrieval from a database. The <COL> portion following 'SELECT' are the field names, or table column names, each separated by a comma.

After indicating fields, the 'FROM' statement indicates the table from which data is being retrieved. Note that if specific field names are indicated in the 'SELECT' statement, then those fields must be present in the indicated table.

 

Depending on the database architecture, the query may need to indicate the schema in which the table is stored: <SCHEMA_NAME>.<TABLE_NAME>

SELECT <COLS>

FROM <TABLE>

 

If all fields are to be retrieved, the '*' can be used for the <COL> portion instead of explicitly stating each field name in the query.

SELECT *

FROM <TABLE>



DISTINCT

The 'DISTINCT' keyword is used in the 'SELECT' statement to retrieve unique values from the indicated <COL>. Multiple <COL>s can be listed after 'DISTINCT', each separated by a comma, which will prompt 'DISTINCT' to retrieve unique data combinations by row of data in the indicated <COL>s.

SELECT DISTINCT <COLS>

FROM <TABLE>

 

Like standard 'SELECT' statements, a 'DISTINCT' keyword can select all table fields with the '*'.

SELECT DISTINCT *

FROM <TABLE>



WHERE

The 'WHERE' clause is used to conditionally filter data, where only records satisfying the indicated conditions are retrieved. It uses comparison operators, to compare record values to an indicated value. If the condition yields TRUE, then the entire record is retrieved in the query.

SELECT <COLS>

FROM <TABLE>

WHERE <CONDITIONAL_COL> <COMPARISON_OPERATOR> <VALUE>

 

Multiple conditions can be indicated in the 'WHERE' clause using logical operators

SELECT <COLS>

FROM <TABLE>

WHERE <CONDITIONAL_COL> <COMPARISON_OPERATOR> <VALUE>

<LOGICAL_OPERATOR> <CONDITIONAL_COL> <COMPARISON_OPERATOR> <VALUE>

<LOGICAL_OPERATOR> <CONDITIONAL_COL> <COMPARISON_OPERATOR> <VALUE>

 

Queries can also be performed to identify records that DO NOT match the defined conditions using one of two methods:

1.       The first method places the keyword 'NOT' before the <CONDITIONAL_COL>.

SELECT <COLS>

FROM <TABLE>

WHERE NOT <CONDITIONAL_COL> <COMPARISON_OPERATOR> <VALUE>

 

2.       The second method places a '!' before the operator to negate it.

SELECT <COLS>

FROM <TABLE>

WHERE <CONDITIONAL_COL> !<COMPARISON_OPERATOR> <VALUE>

 

Note that some versions of SQL do not recognize '!>' and '!<', as they are equivalent to '<' and '>', respectively. '<>' is acceptable in place of '!=', but not encouraged.



NULL

The 'NULL' keyword is used to indicate record fields that do not contain values. This keyword exists because null fields cannot be identified with logical operators.

SELECT <COLS>

FROM <TABLE>

WHERE <CONDITIONAL_COL> IS NULL

 

The 'NULL' keyword can also be used with the 'NOT' operator, negating the 'NULL' and instead returning records with fields that do contain values.

SELECT <COLS>

FROM <TABLE>

WHERE <CONDITIONAL_COL> IS NOT NULL



IN

The 'IN' operator enables queries to apply multiple 'WHERE' parameters with the 'OR' logical operator in a single query.

In other words,

SELECT <COLS>

FROM <TABLE>

WHERE <CONDITIONAL_COL> = <VAL_0>

OR <CONDITIONAL_COL> = <VAL_1>)

 

can alternately be expressed as

SELECT <COLS>

FROM <TABLE>

WHERE <CONDITIONAL_COL> IN (<VAL_0>, <VAL_1>)

 

The 'IN' operator can also be used with the 'NOT' operator, negating 'IN' and instead selecting records where values are not among the indicated values.

SELECT <COLS>

FROM <TABLE>

WHERE <CONDITIONAL_COL> NOT IN (<VAL_0>, <VAL_1>)



BETWEEN

The 'BETWEEN' operator is used with the 'WHERE' clause to select records where field values fall within the indicated range.

The indicated range can be numeric or alphabetic, where alphabetic values are strings enclosed in quotations and numeric values are integers or floats.

SELECT <COLS>

FROM <TABLE>

WHERE <COL> BETWEEN <MIN_VAL> AND <MAX_VAL>

 

The 'BETWEEN' operator can also be used with the 'NOT' operator, negating 'BETWEEN' and instead returning records where values do not fall in the indicated range.

SELECT <COLS>

FROM <TABLE>

WHERE <COL> NOT BETWEEN <MIN_VAL> AND <MAX_VAL>



LIKE

The 'LIKE' operator is used in a 'WHERE' clause to search for an indicated pattern in the indicated <COL>. This is especially useful when querying string values for which exact values may be unknown. The syntax uses wildcard characters to define the pattern:

 

'LIKE' operator syntax with wildcard characters is detailed below.

SELECT <COLS>

FROM <TABLE>

WHERE <COL>

LIKE <LIKE_PARAMETER>


Like Parameters:

 

 

ORDER BY

The 'ORDER BY' keyword sorts the queried records by the parameter values in the indicated <COL>.

 

The sorting direction can be defined with an optional <SORT_KEYWOR>:

SELECT <COLS>

FROM <TABLE>

ORDER BY <SORT_COL> <SORT_KEYWORD>

 

An 'ORDER BY' can be performed along multiple fields. So fields will be sorted in the order in which they are indicated in the 'ORDER BY' clause. Assuming we sort <TABLE> along <SORT_COL_0>, then <SORT_COL_1>:

SELECT <COLS>

FROM <TABLE>

ORDER BY <SORT_COL_0> <SORT_KEYWORD>, <SORT_COL_1> <SORT_KEYWORD>

 

So, <TABLE> will be sorted along <SORT_COL_0>, then records that have common values along the first sort will be sorted further along <SORT_COL_1>.

 

 

GROUP BY

The 'GROUP BY' statement groups records that have matching values in the indicated <GROUPING_COL>, then performs the indicated <FUNCTION> on each group's values. This query returns the unique <GROUPING_COL> groups and the <FUNCTION> results.

SELECT <FUNCTION>(<COL>)

FROM <TABLE>

GROUP BY <GROUPING_COL>

 

Multiple <FUNCTION>s can be applied to the groupings, and the results for each function are returned in their own columns, as indicated in the 'SELECT' statement.

SELECT <FUNCTION_0>(<COL>), <FUNCTION_1>(<COL>), <GROUPING_COL>

FROM <TABLE>

GROUP BY <GROUPING_COL>



INSERT

The 'INSERT INTO' operator inserts new records into the indicated table. The query first indicates the field names, which must match those in the indicated table. The 'VALUES' are then listed in the order in which their corresponding columns are indicated.

INSERT INTO <TABLE> (<COL_0>, <COL_1>, ... ,<COL_N>)

VALUES(<COL_0_VALUE>, <COL_1_VALUE>, ..., <COL_N_VALUE>)

 

Column names do not need to be indicated if values are being inserted into all table columns. However, the order in which the values indicated for insertion are presented in the query must match their placement sequence in the table if the 'INSERT INTO' operator is not going to indicate columns.

 

Multiple records can also be inserted in a single query by placing each set of new record values in parentheses and comma separating each new record in the statement.

INSERT INTO <TABLE> (<COL_0>, <COL_1>, ... ,<COL_N>)

VALUES

(<COL_0_VALUE_0>, <COL_1_VALUE_0>, ..., <COL_N_VALUE_0>),

(<COL_0_VALUE_1>, <COL_1_VALUE_1>, ..., <COL_N_VALUE_1>),

(<COL_0_VALUE_2>, <COL_1_VALUE_2>, ..., <COL_N_VALUE_2>),

...

(<COL_0_VALUE_M>, <COL_1_VALUE_M>, ..., <COL_N_VALUE_M>)



UPDATE

The 'UPDATE' statement updates existing record values in a table with new indicated values.

UPDATE <TABLE>

SET <COL_0> = <VALUE_0>

 

The 'UPDATE' statement can use the 'WHERE' clause to conditionally update values. Multiple field updates can be performed simultaneously by separating each record-to-update with a comma.

UPDATE <TABLE>

SET <COL_0> = <VALUE_0>,

<COL_1> = <VALUE_1>

WHERE <CONDITIONAL_COL> <OPERATOR> <CONDITIONAL_VALUE>

 

 

DELETE

The 'DELETE' statement removes records from the indicated table. If no conditions are defined in a 'DELETE' operation, then all records will be deleted from the indicated table.

DELETE FROM <TABLE>

 

Alternatively, 'TRUNCATE TABLE' will also delete all data from <TABLE>. This is typically faster than 'DELETE'.

TRUNCATE TABLE <TABLE>


Record deletion with 'DELETE' can use the 'WHERE' clause to delete records conditionally. Instead of deleting all records from a table, records are deleted only if the indicated conditions are satisfied. 

DELETE FROM <TABLE>

WHERE <CONDITIONAL_COL> <OPERATOR> <CONDITIONAL_VALUE>

 

Note that 'TRUNCATE' does not allow 'WHERE' clauses.

 

 

FUNCTIONS

Conditional statements can be applied to conditionally execute the functions in the queries below.

 

Return the smallest value record in the indicated column

SELECT MIN(<COL>)

FROM <TABLE>

 

Return the largest value record in the indicated column

SELECT MAX(<COL>)

FROM <TABLE>

 

Return the number of records in the indicated column

SELECT COUNT(<COL>)

FROM <TABLE>

 

 

Return the average of values in the indicated column

SELECT AVG(<COL>)

FROM <TABLE>

 

Returns the sum of values in the indicated column

SELECT SUM(<COL>)

FROM <TABLE>

 

Generate concatenated fields by listing fields and separators as inputs

SELECT CONCAT(<COL_0>, <SEPARATOR>, <COL_1>, <SEPARATOR>, <COL_2)

FROM <TABLE>

 

CONCAT_WS function behaves similarly to CONCAT, but it automatically recognizes the first input as the separator to be applied between each field that follows

SELECT CONCAT_WS(<SEPARATOR>, <COL_0>, <COL_1>, <COL_2>)

FROM <TABLE>



ALIAS

Aliases are temporary names that are assigned to tables or columns within a given query, using the 'AS' keyword. This will not affect the database architecture, and is particularly useful for consolidating subqueries (AKA inner queries) when performing nested or complex queries. This can drastically improve query readability.

 

Encapsulate a query in <ALIAS>

SELECT <ALIAS>.<COL>

FROM <TABLE> AS <ALIAS>

 

Create a new data column titled <ALIAS> comprised of <COL> values

SELECT <COL> AS <ALIAS>

FROM <TABLE>



Create a new data column titled <ALIAS> comprised of <COL_0> and <COL_1> values

SELECT <COL_0> + <COL_1> AS <ALIAS>

FROM <TABLE>



Create a new data column titled <ALIAS> to store an aggregate function value

SELECT <FUNCTION>(<COL>) AS <ALIAS>

FROM <TABLE>

 

 

As a personal preference, I try to be as explicit as possible when referencing table elements to reduce the likelihood of a query error. Depending on the element I'm calling in my query, I usually follow the following convention: <SCHEMA_NAME>.<TABLE_NAME>.<COLUMN_NAME>

However, this convention can be cumbersome to write/read, so aliasing encapsulates the entire element path in a temporary variable name.



TOP/LIMIT/FETCH FIRST

This is an example of SQL syntax variations. Generally referred to as 'TOP', this clause is used with the 'SELECT' statement to identify the first N records in a table. This is particularly useful when querying larger data sets and only wanting to view small sample of the data.

 

'LIMIT' and 'FETCH FIRST' perform the same action as 'TOP', but their syntaxes vary by the version of SQL implemented in a database.

 

TOP

SELECT TOP <N> <COLS>

FROM <TABLE>

 

LIMIT

SELECT <COLS>

FROM <TABLE>

LIMIT <N>

 

FETCH FIRST

SELECT <COLS>

FROM <TABLE>

FETCH FIRST <N> ROWS ONLY



These clauses can also be used with other SQL operations to achieve different views of data. For example, the last N records in a table can be queried by applying the 'ORDER BY' operation with the 'DESC' sort keyword to apply the 'TOP' clause:

 

TOP

SELECT TOP <N> <COLS> 

FROM (

SELECT <COLS>

FROM <TABLE>

ORDER BY <SORT_COL> DESC)

 

LIMIT

SELECT <COLS>

FROM <TABLE>

ORDER BY <SORT_COL> DESC

LIMIT <N>

 

FETCH FIRST

SELECT <COLS>

FROM <TABLE>

ORDER BY <SORT_COL> DESC

FETCH FIRST <N> ROWS ONLY

 

 

The 'TOP' and 'FETCH FIRST' clauses also offers the 'PERCENT' keyword, which enables a query to select the top N percent of records in a table.

 

TOP

SELECT TOP <N> PERCENT <COLS>

FROM <TABLE>

 

LIMIT

not available

 


FETCH FIRST

SELECT <COLS>

FROM <TABLE>

FETCH FIRST <N> PERCENT ROWS ONLY

 

 

RANK

The 'RANK' function ranks data based on an indicated value field. It requires an 'OVER' clause, which specifies a <RANKING_SCHEME> that defines how the ranking should be performed. The <RANKING_SCHEME> is performed on a defined field and outputs a rank value in rank column titled <ALIAS> for every record.

SELECT <COLS>,

RANK() OVER (<RANKING_SCHEME>) AS <ALIAS>

FROM <TABLE>

 

Multiple 'RANK' functions can be implemented in a single query by comma-separating them as individual columns in a query output.

SELECT <COLS>,

RANK() OVER (<RANKING_SCHEME>) AS <ALIAS>,

RANK() OVER (<RANKING_SCHEME>) AS <ALIAS>

FROM <TABLE>

 

The 'RANK' function adds a rank values column to the quieried table indicating a record's ranking in the data according to the indicated <RANKING_SCHEME>, while 'ORDER BY' presents the data according to the <SORT_COL> and <SORT_KEYWORD>. However, it is acceptable to define the <RANKING_SCHEME> with an 'ORDER_BY'.

SELECT <COLS>,

RANK() OVER (ORDER BY <SORT_COL> <SORT_KEYWORD>) AS <ALIAS>

FROM <TABLE>

 

'RANK' can also use the 'PARTITION BY' keyword in the 'OVER' clause, which ranks records among an indicated subset of records. In other words, the records are grouped into subsets by the unique values in the indicated <PART_COL> (similar to using a 'WHERE' clause), and the <RANKING_SCHEME> is iteratively applied to each subset.

SELECT <COLS>,

RANK() OVER (PARTITION BY <PART_COL> <RANKING_SCHEME>) AS <ALIAS>

FROM <TABLE>

 

The 'RANK' function proves utile in data exploration when retrieving top or bottom values (highest values, lowest values) more efficiently than layered 'ORDER BY', 'GROUP BY', and 'TOP'/'LIMIT'/'FETCH FIRST' can.

 

Note that 'RANK' will give tied values the same rank value, but will skip however many ranking values were tied when indexing the output values.

id

employee

age

rank

1

Ben Smith

42

1

2

Sam Gomez

34

2

3

Alex Tan

34

2

4

Sarah Lewis

24

4



ROW_NUMBER

The 'ROW_NUMBER' function is similar to 'RANK' in that values are ranked among all records in the data set, but it assigns a row number indicating its placement in a ranked list instead of its actual ranking value. Like 'RANK', 'ROW_NUMBER' requires an OVER clause to define the <RANKING_SCHEME> and an <ALIAS> to title the ranking column in the query output.

SELECT <COLS>,

ROW_NUMBER() OVER (<RANKING_SCHEME>) AS <ALIAS>

FROM <TABLE>

 

If there are tied value in the <RANKING_SCHEME> column, 'ROW_NUMBER' will assign a ranking row number at random, which can be problematic.

id

employee

age

rank

1

Ben Smith

42

1

2

Sam Gomez

34

2

3

Alex Tan

34

3

4

Sarah Lewis

24

4

 

A solution is to add a secondary ranking scheme that distinguishes tied records. An example with 'ORDER BY' as the <RANKING_SCHEME> demonstrates a possible solution.

SELECT <COLS>,

ROW_NUMBER() OVER (ORDER BY <SORT_COL_0> <SORT_KEYWORD>, <SORT_COL_1> <SORT_KEYWORD>) AS <ALIAS>

FROM <TABLE>

 

Note that attempting this method for breaking ties with 'RANK' will be inaccurate, because 'RANK' will assign rankings based on the combination of the 'ORDER BY' parameters. Rank already handles tied values.

 

The 'PARTITION BY' is keyword is available for the 'ROW_NUMBER' clause.



DENSE_RANK

The 'DENSE_RANK' method combines the best of 'RANK' and 'ROW_NUMBER' in that it ranks values according to the defined <RANK_SCHEME>, accounts for tied values, but does not skip assigned ranking values like 'RANK' inherently does.

SELECT <COLS>,

DENSE_RANK() OVER (<RANKING_SCHEME>) AS <ALIAS>

FROM <TABLE>

 

id

employee

age

rank

1

Ben Smith

42

1

2

Sam Gomez

34

2

3

Alex Tan

34

2

4

Sarah Lewis

24

3

The 'PARTITION BY' is keyword is available for the 'DENSE_RANK' clause.



JOIN



Join Clauses:

 

A blue and black circle with a blue oval

AI-generated content may be incorrect.

 

A diagram of a table

AI-generated content may be incorrect.

 

A blue circle with black text

AI-generated content may be incorrect.

 

A blue circles with black text

AI-generated content may be incorrect.

 

SELECT <COLS>

FROM <LEFT_TABLE>

<JOIN_CLAUSE> <RIGHT_TABLE>

ON <LEFT_TABLE>.<JOIN_KEY>=<RIGHT_TABLE>.<JOIN_KEY>

 

Multiple joins can be performed in a single query. The columns in the 'SELECT' statement must source from one of the JOIN'd tables in the query statement. A 'JOIN' can be performed between a newly JOIN'd table and a previously JOIN'd table in a query statement with multiple joins.

SELECT <COLS>

FROM <LEFT_TABLE>

 

<JOIN_CLAUSE> <RIGHT_TABLE>

ON <LEFT_TABLE>.<JOIN_KEY>=<RIGHT_TABLE>.<JOIN_KEY>

 

<JOIN_CLAUSE> <RIGHT_TABLE>

ON <LEFT_TABLE>.<JOIN_KEY>=<RIGHT_TABLE>.<JOIN_KEY>

 

<JOIN_CLAUSE> <RIGHT_TABLE>

ON <LEFT_TABLE>.<JOIN_KEY>=<RIGHT_TABLE>.<JOIN_KEY>

 

 

JOIN ALIASING

Sometimes multiple tables can have columns with similar names, in which case the columns indicated in the 'SELECT' statement will have to indicate the table from which the query will pull data. This can complicated the SQL syntax, so we will implement aliases to improve readability.

SELECT <ALIAS_0>.<COSL>, <ALIAS_1>.<COLS>, <ALIAS_2>.<COLS>

FROM <LEFT_TABLE> AS <ALIAS_0>

 

<JOIN_CLAUSE> <RIGHT_TABLE> AS <ALIAS_1>

ON <ALIAS_0>.<JOIN_KEY>=<ALIAS_1>.<JOIN_KEY>

 

<JOIN_CLAUSE> <RIGHT_TABLE> AS <ALIAS_2>

ON <ALIAS_0>.<JOIN_KEY>=<ALIAS_2>.<JOIN_KEY>

 

When performing joins, it is good practice to explicitly indicate a column's source table in the 'SELECT' statement and 'JOIN' clause, with or without an alias. This improves the query's readability and reduces any chance for error.

 

 

UNION

The 'UNION' operator combines the results of two 'SELECT' operations into one query output. However, every 'SELECT' operation within the UNION must have the same number of fields, and each like-index field must have similar data types for a 'UNION' to be valid.

SELECT <TABLE_0.COLS>

FROM <TABLE_0>

UNION

SELECT <TABLE_1.COLS>

FROM <TABLE_1>


Clauses like 'WHERE' can be applied to can applied to the individual 'SELECT' operations when selecting data.

Keywords like 'ORDER BY' can be applied to the <UNION> to organize how the query output is presented.

 

EXAMPLE:

 

TABLE_0

id

student_name

age

1

Ben Smith

42

2

Andrew Carter

21

3

Sam Gomez

34



TABLE_1

id

student_name

phone_numer

1

Alex Tan

555-222-2222

2

Sarah Lewis

555-333-3333



QUERY

SELECT id, student_name FROM TABLE_0

WHERE STUDENT_NAME NOT LIKE 'A%'

UNION

SELECT id, student_name FROM TABLE_1

WHERE STUDENT_NAME NOT LIKE 'A%'

ORDER BY STUDENT_NAME DESC



QUERY OUTPUT

id

student_name

2

Sarah Lewis

3

Sam Gomez

1

Ben Smith



ROLLBACK

Sometimes an unintentional transaction (created, updated, or deletion entry, etc.) can be made to a database. SQL has the option to rollback transactions that are not committed, but the database must first be configured to do so.

 

SQL sets auto-commit to 'on' by default, so this can be turned off with the following command:

SET AUTOCOMMIT = OFF;

 

This change to the database now requires users to manually commit changes when a transaction is performed. Transactions can be committed with the following command:ss

COMMIT;

 

If a transaction needs to be undone before it is committed, a user can return to the last commit point with the following command:

ROLLBACK;

 

The database will be restored to its last commit point.

 

 

Helpful Resources