Introduction to SQLite

Eric Burden

05/31/2019

The Basics

What is SQL?

Resource: SQLCourse

  • SQL (pronounced “ess-que-el”) stands for Structured Query Language.
  • It is the standard language for relational database management systems.
  • Used by: Oracle, Sybase, Microsoft SQL Server, Access, Ingres, PostgreSQL, MySQL, etc.
  • The standard SQL commands such as “Select”, “Insert”, “Update”, “Delete”, “Create”, and “Drop” can accomplish almost everything that one needs to do with a database.

What is SQLite?

Resource: SQLite.org

  • SQLite is an embedded SQL database engine.
  • Unlike most other SQL databases, SQLite does not have a separate server process.
  • SQLite reads and writes directly to ordinary disk files.
  • A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

Downloading SQLite

Resource: sqlitetutorial.net

Installing SQLite

  • Linux (Debian-based): sudo apt install sqlite3
  • Mac: brew install sqlite
  • Windows:
    • Unzip the folder downloaded to C:(or, somewhere)
    • You can double-click the sqlite3.exe to run it, or
    • cmd prompt > cd C:\sqlite > sqlite3
    • Also, there’s a GUI Tool

SQLite GUI Tool

Resource: SQLite Browser

Importing Data into SQLite (GUI)

Step 1

Resource: SQLite Browser

  1. Create a new database if you don’t have one open:

Step 2

Resource: SQLite Browser

  1. File Menu > Import > Table from CSV file…::

Step 3

Resource: SQLite Browser

  1. Configure import settings, then save:

Importing Data into SQLite (Console)

It’s Pretty Easy

> sqlite3 db_name
> .mode csv
> .import path/to/file table_name

The rest of my examples will all use the console:

  • Windows: PowerShell
  • Linux: Terminal (or whatever you prefer)
  • Mac: Mac Terminal (or whatever you prefer)

Our Dataset

nycflights13::airlines

dplyr::glimpse(nycflights13::airlines)
## Observations: 16
## Variables: 2
## $ carrier <chr> "9E", "AA", "AS", "B6", "DL", "EV", "F9", "FL", "HA", ...
## $ name    <chr> "Endeavor Air Inc.", "American Airlines Inc.", "Alaska...

nycflights13::airports

dplyr::glimpse(nycflights13::airports)
## Observations: 1,458
## Variables: 8
## $ faa   <chr> "04G", "06A", "06C", "06N", "09J", "0A9", "0G6", "0G7", ...
## $ name  <chr> "Lansdowne Airport", "Moton Field Municipal Airport", "S...
## $ lat   <dbl> 41.13047, 32.46057, 41.98934, 41.43191, 31.07447, 36.371...
## $ lon   <dbl> -80.61958, -85.68003, -88.10124, -74.39156, -81.42778, -...
## $ alt   <int> 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409,...
## $ tz    <dbl> -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, ...
## $ dst   <chr> "A", "A", "A", "A", "A", "A", "A", "A", "U", "A", "A", "...
## $ tzone <chr> "America/New_York", "America/Chicago", "America/Chicago"...

nycflights13::flights

dplyr::glimpse(nycflights13::flights)
## Observations: 336,776
## Variables: 19
## $ year           <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,...
## $ month          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ day            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ dep_time       <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 55...
## $ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 600, 600, 60...
## $ dep_delay      <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2...
## $ arr_time       <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 7...
## $ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 723, 846, 7...
## $ arr_delay      <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -...
## $ carrier        <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV",...
## $ flight         <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79...
## $ tailnum        <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN...
## $ origin         <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR"...
## $ dest           <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL"...
## $ air_time       <dbl> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138...
## $ distance       <dbl> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 94...
## $ hour           <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5,...
## $ minute         <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ time_hour      <dttm> 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013...

nycflights13::planes

dplyr::glimpse(nycflights13::planes)
## Observations: 3,322
## Variables: 9
## $ tailnum      <chr> "N10156", "N102UW", "N103US", "N104UW", "N10575",...
## $ year         <int> 2004, 1998, 1999, 1999, 2002, 1999, 1999, 1999, 1...
## $ type         <chr> "Fixed wing multi engine", "Fixed wing multi engi...
## $ manufacturer <chr> "EMBRAER", "AIRBUS INDUSTRIE", "AIRBUS INDUSTRIE"...
## $ model        <chr> "EMB-145XR", "A320-214", "A320-214", "A320-214", ...
## $ engines      <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2...
## $ seats        <int> 55, 182, 182, 182, 55, 182, 182, 182, 182, 182, 5...
## $ speed        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ engine       <chr> "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turbo-fan...

nycflights13::weather

dplyr::glimpse(nycflights13::weather)
## Observations: 26,115
## Variables: 15
## $ origin     <chr> "EWR", "EWR", "EWR", "EWR", "EWR", "EWR", "EWR", "E...
## $ year       <dbl> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 201...
## $ month      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ day        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ hour       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, ...
## $ temp       <dbl> 39.02, 39.02, 39.02, 39.92, 39.02, 37.94, 39.02, 39...
## $ dewp       <dbl> 26.06, 26.96, 28.04, 28.04, 28.04, 28.04, 28.04, 28...
## $ humid      <dbl> 59.37, 61.63, 64.43, 62.21, 64.43, 67.21, 64.43, 62...
## $ wind_dir   <dbl> 270, 250, 240, 250, 260, 240, 240, 250, 260, 260, 2...
## $ wind_speed <dbl> 10.35702, 8.05546, 11.50780, 12.65858, 12.65858, 11...
## $ wind_gust  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ precip     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ pressure   <dbl> 1012.0, 1012.3, 1012.5, 1012.2, 1011.9, 1012.4, 101...
## $ visib      <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,...
## $ time_hour  <dttm> 2013-01-01 01:00:00, 2013-01-01 02:00:00, 2013-01-...

Import our Data

> sqlite3 nycflights13.sqlite
> .mode csv
> .import airlines.csv airlines
> .import airports.csv airports
> .import flights.csv flights
> .import planes.csv planes
> .import weather.csv weather

SQL Commands

Select All

select *
from flights
## Observations: 336,776
## Variables: 20
## $ ..1            <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "1...
## $ year           <chr> "2013", "2013", "2013", "2013", "2013", "2013",...
## $ month          <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1...
## $ day            <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1...
## $ dep_time       <chr> "517", "533", "542", "544", "554", "554", "555"...
## $ sched_dep_time <chr> "515", "529", "540", "545", "600", "558", "600"...
## $ dep_delay      <chr> "2", "4", "2", "-1", "-6", "-4", "-5", "-3", "-...
## $ arr_time       <chr> "830", "850", "923", "1004", "812", "740", "913...
## $ sched_arr_time <chr> "819", "830", "850", "1022", "837", "728", "854...
## $ arr_delay      <chr> "11", "20", "33", "-18", "-25", "12", "19", "-1...
## $ carrier        <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV",...
## $ flight         <chr> "1545", "1714", "1141", "725", "461", "1696", "...
## $ tailnum        <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN...
## $ origin         <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR"...
## $ dest           <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL"...
## $ air_time       <chr> "227", "227", "160", "183", "116", "150", "158"...
## $ distance       <chr> "1400", "1416", "1089", "1576", "762", "719", "...
## $ hour           <chr> "5", "5", "5", "5", "6", "5", "6", "6", "6", "6...
## $ minute         <chr> "15", "29", "40", "45", "0", "58", "0", "0", "0...
## $ time_hour      <chr> "2013-01-01 05:00:00", "2013-01-01 05:00:00", "...

Limited Select

select origin
  ,dest
  ,carrier
  ,flight
from flights
limit 5
5 records
origin dest carrier flight
EWR IAH UA 1545
LGA IAH UA 1714
JFK MIA AA 1141
JFK BQN B6 725
LGA ATL DL 461

Conditional Select

select year || '-' || month || '-' || day as date 
  ,origin
  ,dest
  ,carrier
  ,flight
from flights
where dest = "MIA"
limit 5
5 records
date origin dest carrier flight
2013-1-1 JFK MIA AA 1141
2013-1-1 EWR MIA AA 1895
2013-1-1 EWR MIA UA 1077
2013-1-1 LGA MIA AA 1837
2013-1-1 LGA MIA DL 2003

Ordered Select

select year || '-' || month || '-' || day as date 
  ,origin
  ,dest
  ,carrier
  ,flight
from flights
where dest = "MIA"
order by year desc
  ,month desc
  ,day desc
limit 5
5 records
date origin dest carrier flight
2013-9-9 JFK MIA AA 2243
2013-9-9 LGA MIA AA 1175
2013-9-9 EWR MIA AA 1205
2013-9-9 LGA MIA DL 2003
2013-9-9 JFK MIA AA 1345

Aggregate Select

select carrier
  ,count(*) as num_flights
from flights
group by carrier
order by num_flights desc
limit 5
5 records
carrier num_flights
UA 58665
B6 54635
EV 54173
DL 48110
AA 32729

Combinations (Joins)

select name
  ,count(*) as num_flights
from flights
left join airlines
  on airlines.carrier = flights.carrier
group by flights.carrier
order by num_flights desc
limit 5
5 records
name num_flights
United Air Lines Inc. 58665
JetBlue Airways 54635
ExpressJet Airlines Inc. 54173
Delta Air Lines Inc. 48110
American Airlines Inc. 32729

A Little More About Joins

Inner Join
Left Outer Join
Cross Join

Rain and Arrival Delays

select wp.precip_cat
  ,avg(arr_delay) as avg_arr_delay
from flights
left join (
  select year
    ,month
    ,day
    ,case
      when precip = 0 then '0'
      when precip > 0 and precip <= .0045 then '<= .0045'
      when precip > .0045 and precip <= 1 then '.0045 - 1'
      when precip > 1 then '>1'
      end as precip_cat
  from weather
) wp
  on wp.year = flights.year
    and wp.month = flights.month
    and wp.day = flights.day
group by wp.precip_cat
order by avg_arr_delay desc
4 records
precip_cat avg_arr_delay
>1 28.385350
.0045 - 1 22.871716
NA 6.076031
0 5.523218

Exporting from SQLite

GUI - Step 1

Resource: SQLite Browser

  1. Save query as a new table:

GUI - Step 2

Resource: SQLite Browser

  1. File Menu > Export > Table(s) as CSV file…:

GUI - Step 3

Resource: SQLite Browser

  1. Configure settings and save table:

Using the Console

> sqlite3 nycflights13.csv
> .mode csv
> .output miami_flights.csv
> select * from flights where dest = "MIA";
> .output stdout

Additional Resources

Database File (DBHub.io)

Dataset Documentation

A Really Neat Tool