15 Sep

SQL, which stands for Structured Query Language, is a language for interacting with data stored in something called a relational database.

You can think of a relational database as a collection of tables. A table is just a set of rows and columns which represents exactly one type of entity. For example, a table might represent employees in a company or purchases made, but not both.

Each row, or record, of a table contains information about a single entity. For example, in a table representing employees, each row represents a single person. Each column, or field, of a table contains a single attribute for all rows in the table. 

Selecting columns. While SQL can be used to create and modify databases, the focus of this course will be querying databases. A query is a request for data from a database table (or combination of tables).

  1. Single column. In SQL, you can select data from a table using a SELECT statement. For example, the following query selects the name column from the people table. In this query, SELECT and FROM are called keywords.
    SELECT name FROM people;
  2. Multiple columns. 
    • To select multiple columns from a table, simply separate the column names with commas!
      • SELECT name, birthdate FROM people;
    • You may want to select all columns from a table, there's a handy shortcut:
      • SELECT * FROM people;
    • If you only want to return a certain number of results, you can use the LIMITkeyword to limit the number of rows returned:
      • SELECT * FROM people LIMIT 10;
  3. Unique values.  If you want to select all the unique values from a column, you can use the DISTINCT keyword.
    • SELECT DISTINCT language FROM films;
  4. Count.The COUNT statement returns the number of rows in one or more columns.
    • This code gives the number of rows in the peopletable:
      • SELECT COUNT(*) FROM people;
    •  If you want to count the number of non-missing values in a particular column, you can call COUNT on just that column.
      • SELECT COUNT(col_name) FROM my_table;
    • It's also common to combine COUNT with DISTINCT to count the number of distinct values in a column.
      • SELECT COUNT(DISTINCT birthdate) FROM people;
  5. Observations:
    •  It's good practice to make SQL keywords uppercase to distinguish them from other parts of your query.
    • It's also good practice to include a semicolon at the end of your query. This tells SQL where the end of your query is!


Filtering results.

In SQL, the WHERE keyword allows you to filter based on both text and numeric values in a table. There are a few different comparison operators you can use:

  • = equal
  • <> not equal
  • < less than
  • > greater than
  • <= less than or equal to
  • >= greater than or equal to

Important: in PostgreSQL (the version of SQL we're using), you must use single quotes with WHERE. Aggregate functions can't be used in WHERE clauses.

  1. One condition 
    • Text records.The following code returns all films with the title 'Metropolis':(The WHERE clause always comes after the FROM statement).
      • SELECT title FROM films WHERE title = 'Metropolis';
    • Numeric values. The following query selects all details for films with a budget over ten thousand dollars:
      • SELECT * FROM films WHERE budget > 10000;
  2. Combining conditions.
    • WHERE AND. You can build up your WHERE queries by combining multiple conditions with the AND keyword.
      • SELECT title FROM films WHERE release_year > 1994 AND release_year < 2000;
      • Note that you need to specify the column name separately for every AND condition.You can add as many AND conditions as you need!
    • WHERE AND OR.What if you want to select rows based on multiple conditions where some but not all of the conditions need to be met? For this, SQL has the OR operator.When combining AND and OR, be sure to enclose the individual clauses in parentheses, like so:
      • SELECT title FROM films WHERE (release_year = 1994 OR release_year = 1995) AND (certification = 'PG' OR certification = 'R');
    • BETWEEN.  
      • You can use the following query to get titles of all films released in and between 1994 and 2000:
        • SELECT title FROM films WHERE release_year >= 1994 AND release_year <= 2000;
      • Checking for ranges like this is very common, so in SQL the BETWEEN keyword provides a useful shorthand for filtering values within a specified range. This query is equivalent to the one above:
        • SELECT title FROM films WHERE release_year BETWEEN 1994 AND 2000;
        • Note: BETWEEN is inclusive
    • IN. The IN operator allows you to specify multiple values in a WHERE clause. If you want to filter based on many conditions, WHERE can get unwieldy. For example, using OR's, and then IN:
      • SELECT name FROM kids WHERE age = 2 OR age = 4 OR age = 6 OR age = 8 OR age = 10;
      • SELECT name FROM kids WHERE age IN (2, 4, 6, 8, 10);
    • NULL and IS NULL. NULL represents a missing or unknown value. You can check for NULL values using the expression IS NULL.
      • IS NULL is useful when combined with WHERE to figure out what data you're missing.
        • SELECT COUNT(*) FROM people WHERE birthdate IS NULL;
      • You can use the IS NOT NULL operator if you want to filter out missing values so you only get results which are not NULL.
        • SELECT name FROM people WHERE birthdate IS NOT NULL;
    • LIKE and NOT LIKE.  The LIKE operator can be used in a WHERE clause to search for a pattern in a column.  To accomplish this, you use something called a wildcard as a placeholder for some other values. There are two wildcards you can use with LIKE:
      • %. Will match zero, one, or many characters in text. For example, the following query matches companies like 'Data', 'DataC' 'DataCamp', 'DataMind', and so on:
        • SELECT name FROM companies WHERE name LIKE 'Data%';
      • _.  Will match a single character. For example, the following first query matches companies like 'DataCamp', 'DataComp', and so on. The second query gets the names of people whose names have 'r' as the second letter. 
        • SELECT name FROM companies WHERE name LIKE 'DataC_mp';
        • SELECT name FROM people WHERE name LIKE '_r%'
      • You can also use the NOT LIKE operator to find records that don'tmatch the pattern you specify.


Aggregate functions, Arithmetic and Aliasing.

Often, you will want to perform some calculation on the data in a database. SQL  aggregate functions, to help you out with summarizing your data.

  • For example, the following gives you the average value from the budget column of the films table:
    • SELECT AVG(budget) FROM films;
    • Similarly, MAX, MIN, SUM would give you the maximum, minimum and sum of all observations in the specified column.
  • Combining aggregate functions with WHERE. For example, to get the total budget of movies made in the year 2010 or later:
    • SELECT SUM(budget) FROM films WHERE release_year >= 2010;
  • Remark. SQL basically provides you a bunch of building blocks that you can combine in all kinds of ways. Hence the name: Structured Query Language.
  • Arithmetic. You can perform basic arithmetic with symbols like +, -, *, and /.
    • SELECT (4*3) gives the result of 12.
    • SELECT (4/3)  gives the result of 1 (SQL assumes you want an integer back). To avoid this express it as integer with decimals.
    • SELECT (4.0/3.0) gives the result of 1.333
  • Aliasing. SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS keyword. For ex., the following gives you a result with the specified column names:
    • Get the title and net profit (the amount a film grossed, minus its budget) for all films. Alias the net profit as net_profit.
      • SELECT title, (gross-budget) AS net_profit FROM films;
    • Get the average duration in hours for all films, aliased asavg_duration_hours.
      • SELECT AVG(duration)/60.0 AS avg_duration_hours FROM films;
    • Get the percentage of people who are no longer alive. Alias the result as percentage_dead.
      • SELECT 100.0*COUNT(deathdate)/COUNT(*) AS percentage_dead FROM people;


Sorting, Grouping and Joins

  • ORDER BY. ORDER BY keyword is used to sort results in ascending or descending order according to the values of one or more columns.

    By default ORDER BY will sort in ascending order. If you want to sort the results in descending order, you can use the DESC keyword.

    • The following, gives you the titles of films sorted by release year, from newest to oldest.

      • SELECT title FROM films ORDER BY release_year DESC;
      • Remark. Text values are ordered alphabetically.

    • Get all details for all films except those released in 2015 and order them by duration.
      • SELECT * FROM films WHERE release_year NOT IN ('2015') ORDER BY duration;
    • Sorting multiple columns.ORDER BY can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example, the following sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. The order of columns is important!
      • SELECT birthdate, name FROM people ORDER BY birthdate, name;
      • Remark. The second column you order on only steps in when the first column is not decisive to tell the order. The second column acts as a tie breaker.

  • GROUP BY.Often you'll need to aggregate results, GROUP BY is for performing operations within groups. For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, GROUP BY allows you to group a result by one or more columns, like so:

    • Commonly, GROUP BY is used with aggregate functions like COUNT() or MAX(). Note that GROUP BY always goes after the FROM clause!
    • You can combine GROUP BY with ORDER BY to group your results, calculate something about them, and then order your results. For example:
    • Remark. SQL will return an error if you try to SELECT a field that is not in your GROUP BY clause without using it to calculate some kind of value about the entire group.
    • Examples.
      • Grouping by several variables. Get the release year, country, and highest budget spent making a film for each year, for each country. Sort your results by release year and country.
        • SELECT release_year, country, MAX(budget) FROM films
          GROUP BY release_year, country
          ORDER BY release_year, country;
  • HAVING. If you want to filter based on the result of an aggregate function, you need the HAVING clause.
    • Aggregate functions can't be used in WHERE clauses. For example, the following query is invalid: 
      • SELECT release_year FROM films GROUP BY release_year WHERE COUNT(title) > 10;
    • Using HAVING. The following shows only those years in which more than 10 films were released.
      • SELECT release_year FROM films GROUP BY release_year HAVING COUNT(title) > 10;
  • Examples.
    • Returns the average budget and average gross earnings for films in each year after 1990, if the average budget is greater than $60 million.
      • SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross FROM films
        GROUP BY release_year
        HAVING AVG(budget)>60000000 AND release_year>1990
        ORDER BY avg_gross DESC;
    • Get the country, average budget, and average gross take of countries that have made more than 10 films. Order the result by country name, and limit the number of results displayed to 5. You should alias the averages as avg_budget and avg_gross respectively.
      • SELECT country, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
        FROM films
        GROUP BY country
        HAVING COUNT(title)>10
        ORDER BY country
        LIMIT 5;
  • JOIN. We will often want to query multiple tables. For example, what if you want to see the IMDB score for a particular movie? We'd want to get the ID of the movie from the filmstable and then use it to get IMDB information from the reviewstable. 
    • The following query gets the IMDB score for the film To Kill a Mockingbird!
      • SELECT title, imdb_score
        FROM films
        JOIN reviews
        ON films.id = reviews.film_id
        WHERE title = 'To Kill a Mockingbird';



Comments
* The email will not be published on the website.
I BUILT MY SITE FOR FREE USING