22 Sep

The focus of this course is using SQL to join two or more database tables together into a single table. INNER JOIN and LEFT JOIN are probably the most common joins

INTRODUCTION TO JOINS

  • INNER JOIN
    • We have matching values from the id field which are colored qith the same color. The id field is known as a key field, since it can be used to reference one table to another; both tables have also another field named val; inner joins only includes records in which the key is in both tables, with inner joins we look for matches in the right table corresponging to all entries in the key field in the left table.


    • Remark. The arrows perfectly indicate which observations are to be added to the new table
    • Example in SQL. The  basic syntax for an INNER JOIN, here including all columns in both tables: 
      • SELECT * FROM left_table INNER JOIN right_table ON left_table.id = right_table.id;



    • Combine Multiple Joins. Combine multiple joins in a single query is a powerful feature of SQL, e.g:
      • SELECT * FROM left_table INNER JOIN right_table ON left_table.id = right_table.id INNER JOIN another_table ON left_table.id = another_table.id;
    • Remark.
      •  Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use AS to add the alias immediately after the table name with a space.
      • It is perfectly OK if the right table has multiple rows matching the same key field, th resulting table adds this multiple rows.
      • Important. Sometimes yo need to join on multiple variables.
        • The trouble with doing your last join on c.code = e.code and not also including year is that e.g. the 2010 value for fertility_rate is also paired with the 2015 value for unemployment_rate. Fix your previous query: in your last ON clause, use AND to add an additional joining condition. In addition to joining on code in cand e, also join on year in eand p.
    • INNER JOIN via USING. When the variable on which we want to perform the join has the same name on both tables we can specify it with USING. The following commands give the same result.


    • Self Joins. We may want to join a table to itself, the reason is we look for all matches from the left table to the rigt one that matches, generating a new table where we have pairs of the same field that match a condition on another field.Ex.
      •  
        Prime Minister table

      • If we don't want to pair an observation to itself we remove it by
      • Great example with Self Join.
    • CASE WHEN and THEN. When we want to create a new variable in a current table depending on a condition of an existing field we use CASE WHEN and THEN. Ex. (Note the use of ELSE END AS)
      • Obs. Observe the use of  the INTOcommand to create this countries_plus table:
        • SELECT name, continent, code, surface_area,   CASE WHEN surface_area > 2000000 THEN 'large'       WHEN surface_area > 350000  THEN 'medium'       ELSE 'small' END      AS geosize_group INTO countries_plus FROM countries;


Outer Joins and Cross Joins

You'll come to grips with different kinds of outer joins. You'll learn how to gain further insights into your data through left joins, right joins, and full joins. In addition to outer joins, you'll also work with cross joins.

Obs. You can remember outer joins as reaching out to another table while keeping all of the records from the original table. Inner joins keep only the records in both tables. We'll begin exploring the three types of Outer Joins.

  • LEFT JOIN.
    • This join keeps all the records from the left table but those which don't have a match on the key value on the right table have a missing value in the corresponding variables from the right table.


    • The syntax of a left join is as follows
    • Ex.Begin with a left join with the countries table on the left and the economies table on the right.Focus only on records with 2010 as the year.Calculate the average GDP per capita AS avg_gdp for each regionin 2010. Select the regionand avg_gdp fields.Arrange this data on average GDP per capita for each region in 2010 from highest to lowest average GDP per capita.
  • RIGHT JOIN. Is very similar to the left join, just in this case roles are inverted.
    •  Remark. You can always write a right join as a left join.
  • FULL JOIN. 
    • This join keeps all the records from both the left table and the right table but those which don't have a match on the key value have a missing value in the corresponding variables from corresponding table.

    • Ex. Using the leaders database.
    • Ex. Full joining three tables.
  • CROSS JOIN
    • Create all possible combinations from two tables


    • Remark. Recall that cross joins do not use ON or USING
  • Outer Challenge.In terms of life expectancy for 2010, determine the names of the lowest five countries and their regions. (Use the countries and population tables).



Set theory clauses

You'll learn more about set theory using Venn diagrams and you will be introduced to union, union all, intersect, and except clauses. You'll finish by investigating semi-joins and anti-joins, which provide a nice introduction to subqueries. The clauses are represented by the following Venn Diagrams

  • UNION and UNION ALL.


    • UNION



    • UNION ALL
    • Remark. The fields you select in the SELECT clause, are the fields that work as the id for the union of the tables
  • INTERSECT
    • You can intersect on any quantity of fields. For ex. 1 and 2


  • EXCEPT.
    • Ex.
  • Semi-joins and Anti-joins. These are an introduction to Subqueries. These are used as follows by the diagrams.
    • Semi-join
    • Anti-join


Subqueries

A subquery is a SQL query within a query, i.e, subqueries are nested queries that provide data to the enclosing query. You can have subqueries inside WHERE, SELECT, FROM.

  • Inside WHERE.
    • Asia countries  below average fert_rate.
    • Another example.
  • Inside SELECT
    • The following using an INNER JOIN, is equivalent to th second query
    • Counting the number of countries that appear in each continent (that are in the prime_minister table) at states table.
  • Inside FROM
    • We hace the percentage of women in the parlament for each country in "states", and we want the maximum percentage for each continent appearing in the monarchs table
    • Number of languages spoken for each country, with the number of the country appearing just like in the "local_name" field of "countries".
    • Identifying each continent's country with the highest inflation_rate using joins and multiple subqueries.
  • Challenges.



Comments
* The email will not be published on the website.
I BUILT MY SITE FOR FREE USING