SQL, which stands for Structured Query Language, is a language for interacting with data stored in something called a relational database.
You can think of a relational database as a collection of tables. A table is just a set of rows and columns which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.
Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table.
Selecting columns. While SQL can be used to create and modify databases, the focus of this course will be querying databases. A query is a request for data from a database table (or combination of tables).
SELECT
statement. For example, the following query selects the name
column from the people
table. In this query, SELECT
and FROM
are called keywords.SELECT name FROM people;
SELECT name, birthdate FROM people;
SELECT * FROM people;
LIMIT
keyword to limit the number of rows returned:SELECT * FROM people LIMIT 10;
DISTINCT
keyword.SELECT DISTINCT language FROM films;
COUNT
statement returns the number of rows in one or more columns.people
table:SELECT COUNT(*) FROM people;
COUNT
on just that column.SELECT COUNT(col_name) FROM my_table;
COUNT
with DISTINCT
to count the number of distinct values in a column.SELECT COUNT(DISTINCT birthdate) FROM people;
Filtering results.
In SQL, the WHERE
keyword allows you to filter based on both text and numeric values in a table. There are a few different comparison operators you can use:
=
equal<>
not equal<
less than>
greater than<=
less than or equal to>=
greater than or equal toImportant: in PostgreSQL (the version of SQL we're using), you must use single quotes with WHERE
. Aggregate functions can't be used in WHERE clauses.
'Metropolis'
:(The WHERE
clause always comes after the FROM
statement).SELECT title FROM films WHERE title = 'Metropolis';
SELECT * FROM films WHERE budget > 10000;
WHERE
queries by combining multiple conditions with the AND
keyword.SELECT title FROM films WHERE release_year > 1994 AND release_year < 2000;
AND
condition.You can add as many AND
conditions as you need!OR
operator.When combining AND
and OR
, be sure to enclose the individual clauses in parentheses, like so:SELECT title FROM films WHERE (release_year = 1994 OR release_year = 1995) AND (certification = 'PG' OR certification = 'R');
SELECT title FROM films WHERE release_year >= 1994 AND release_year <= 2000;
BETWEEN
keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above:SELECT title FROM films WHERE release_year BETWEEN 1994 AND 2000;
BETWEEN
is inclusiveIN
operator allows you to specify multiple values in a WHERE
clause. If you want to filter based on many conditions, WHERE
can get unwieldy. For example, using OR's, and then IN:SELECT name FROM kids WHERE age = 2 OR age = 4 OR age = 6 OR age = 8 OR age = 10;
SELECT name FROM kids WHERE age IN (2, 4, 6, 8, 10);
NULL
represents a missing or unknown value. You can check for NULL
values using the expression IS NULL
.IS NULL
is useful when combined with WHERE
to figure out what data you're missing.SELECT COUNT(*) FROM people WHERE birthdate IS NULL;
IS NOT NULL
operator if you want to filter out missing values so you only get results which are not NULL
.SELECT name FROM people WHERE birthdate IS NOT NULL;
LIKE
operator can be used in a WHERE
clause to search for a pattern in a column. To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE
:'Data'
, 'DataC'
'DataCamp'
, 'DataMind'
, and so on:SELECT name FROM companies WHERE name LIKE 'Data%';
'DataCamp'
, 'DataComp'
, and so on. The second query gets the names of people whose names have 'r' as the second letter. SELECT name FROM companies WHERE name LIKE 'DataC_mp';
SELECT name FROM people WHERE name LIKE '_r%'
NOT LIKE
operator to find records that don'tmatch the pattern you specify.Aggregate functions, Arithmetic and Aliasing.
Often, you will want to perform some calculation on the data in a database. SQL aggregate functions, to help you out with summarizing your data.
budget
column of the films
table:SELECT AVG(budget) FROM films;
SELECT SUM(budget) FROM films WHERE release_year >= 2010;
+
, -
, *
, and /.
AS
keyword. For ex., the following gives you a result with the specified column names:net_profit
.avg_duration_hours
.people
who are no longer alive. Alias the result as percentage_dead
.Sorting, Grouping and Joins
ORDER BY
keyword is used to sort results in ascending or descending order according to the values of one or more columns.By default ORDER BY
will sort in ascending order. If you want to sort the results in descending order, you can use the DESC
keyword.
The following, gives you the titles of films sorted by release year, from newest to oldest.
SELECT title FROM films ORDER BY release_year DESC;
Remark. Text values are ordered alphabetically.
ORDER BY
can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example, the following sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. The order of columns is important!SELECT birthdate, name FROM people ORDER BY birthdate, name;
GROUP BY.Often you'll need to aggregate results, GROUP BY is for performing operations within groups. For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, GROUP BY
allows you to group a result by one or more columns, like so:
GROUP BY
is used with aggregate functions like COUNT()
or MAX()
. Note that GROUP BY
always goes after the FROM
clause!GROUP BY
with ORDER BY
to group your results, calculate something about them, and then order your results. For example:SELECT
a field that is not in your GROUP BY
clause without using it to calculate some kind of value about the entire group.HAVING
clause.WHERE
clauses. For example, the following query is invalid: SELECT release_year FROM films GROUP BY release_year WHERE COUNT(title) > 10;
SELECT release_year FROM films GROUP BY release_year HAVING COUNT(title) > 10;
avg_budget
and avg_gross
respectively.films
table and then use it to get IMDB information from the reviews
table.