Welcome!

This is meant to be a fun and light introduction to webscraping, geocoding, and map-making

The motivation

2022 Center City District Sips

  • Summer event featuring happy hour specials every Wednesday evening
  • Website list all of the restaurants & bars participating in the event
  • There is no map view which makes it hard to locate a happy hour special nearby

Grid layout of participating restaurants on the website. Each grid card includes an image, address, and link to the specials

Our approach

We’re going to use R tools to build an interactive map!



1. Scraping the data
Scrape restaurants and addresses from the website with rvest



2. Geocoding the addresses
Geocode the restaurant addresses to obtain geographical coordinates with tidygeocoder



3. Building the map
Build an interactive map with leaflet

Packages

Package Purpose Version
tidyverse Data manipulation and iteration functions 1.3.2.90
here File referencing in project-oriented workflows 0.7.13
knitr Style data frame output into formatted table 1.40
robotstxt Check website for scraping permissions 0.7.13
rvest Scrape the information off of the website 1.0.3
tidygeocoder Geocode the restaurant addresses 1.0.5
leaflet Build the interactive map 2.1.1
leaflet.extras Add extra functionality to map 1.0.0

Checking site permissions

We wil use the robotstxt package to check the site’s terms of service

We want to look for whether any pages are not allowed to be crawled by bots/scrapers. In our case there aren’t any, indicated by Allow: /

get_robotstxt("https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view")
[robots.txt]
--------------------------------------

# robots.txt overwrite by: on_suspect_content

User-agent: *
Allow: /



[events]
--------------------------------------

requested:   https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view/robots.txt 
downloaded:  https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view/robots.txt 

$on_not_found
$on_not_found$status_code
[1] 404


$on_file_type_mismatch
$on_file_type_mismatch$content_type
[1] "text/html; charset=utf-8"


$on_suspect_content
$on_suspect_content$parsable
[1] FALSE

$on_suspect_content$content_suspect
[1] TRUE


[attributes]
--------------------------------------

problems, cached, request, class

Harvesting data from the first page

We will use the rvest package to scrape all the information we need from each webpage

# define the page
url <- "https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1"

# read the page html
html1 <- read_html(url)

html1
{html_document}
<html lang="en">
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
[2] <body class=" \nu-theme-primary\n" data-apos-level="3">\n    \n    \n     ...
# extract table info
table1 <- 
  html1 |> 
  html_node("table") |> 
  html_table()
table1 |> head(3)
Name Address Phone CCD SIPS Specials
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 CCD SIPS Specials
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 CCD SIPS Specials
Air Grille Garden at Dilworth Park 1 S 15th St, Philadelphia, PA 19102 215.587.2761 CCD SIPS Specials
# extract hyperlinks to specific restaurant/bar specials
links <- 
  html1 |> 
  html_elements(".o-table__tag.ccd-text-link") |> 
  html_attr("href") |> 
  as_tibble()
links |> head(3)
value
#1225-raw-sushi-and-sake-lounge
#1518-bar-and-grill
#air-grill-garden-dilworth-park
# add full hyperlinks to the table info
table1Mod <-
  bind_cols(table1, links) |> 
  mutate(Specials = paste0(url, value)) |> 
  select(-c(`CCD SIPS Specials`, value))
table1Mod |> head(3)
Name Address Phone Specials
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill
Air Grille Garden at Dilworth Park 1 S 15th St, Philadelphia, PA 19102 215.587.2761 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#air-grill-garden-dilworth-park

Harvesting data from the remaining pages

getTables <- function(pageNumber) {
 
  # wait 2 seconds between each scrape
  Sys.sleep(2)
  
  url <- paste0("https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=", pageNumber)
  
  # read the page html
  html <- read_html(url)
  
  # extract table info
  table <- 
    html |> 
    html_node("table") |>
    html_table()
  
  # extract hyperlinks to specific restaurant/bar specials
  links <- 
    html |> 
    html_elements(".o-table__tag.ccd-text-link") |> 
    html_attr("href") |> 
    as_tibble()
  
  # add full hyperlinks to the table info
  tableSpecials <<-
    bind_cols(table, links) |> 
    mutate(Specials = paste0(url, value)) |> 
    select(-c(`CCD SIPS Specials`, value))
}
# get remaining tables
table2 <- map_df(2:4, getTables) 

# combine all tables
table <- bind_rows(table1Mod, table2)
table |> head(3)
# save table with scraped addresses to file
write_rds(table,
          file = here(data, "specialsScraped.Rds"))
Name Address Phone Specials
1028 Yamitsuki Sushi & Ramen 1028 Arch Street, Philadelphia, PA 19107 215.629.3888 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill

Skip this step and load the data from the data/ folder:

table <- read_rds(here(data, "specialsScraped.Rds"))
table |> head(3)
Name Address Phone Specials
1028 Yamitsuki Sushi & Ramen 1028 Arch Street, Philadelphia, PA 19107 215.629.3888 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill

Geocoding

We will use the tidygeocoder package to help us convert the addresses to geographical coordinates, and we will specify we want to use the ArcGIS service

# Geocoding the addresses
specials <- 
  table |> 
  geocode(address = Address,
          method = "arcgis", 
          long = "Longitude",
          lat = "Latitude")

specials |> head(3)
Name Address Phone Specials Latitude Longitude
1028 Yamitsuki Sushi & Ramen 1028 Arch Street, Philadelphia, PA 19107 215.629.3888 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen 39.95342 -75.15750
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge 39.94976 -75.16089
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill 39.95024 -75.16664
# save table with geocoded addresses to file
write_rds(specials,
          file = here(data, "specialsGeocoded.Rds"))

What about other geocoding services?

I found some variation between different geocoding services!

I tried 3 services that don’t require an API key and one for which I already had an API key.

Service API key required Time to run Unmatched addresses
ArcGIS (arcgis) No 22.4 secs 0/60
Nominatum (osm) No 60.2 secs 4/60
Census (census) No 4.6 secs 2/60
Google (google) Yes 9 secs 0/60
Name Address Latitude Longitude
Bud & Marilyn’s 1234 Locust St, Philadephia, PA 19107 NA NA
Con Murphy’s Irish Pub 1700 Ben Franklin Pkwy, Philadelphia, PA 19103 NA NA
Independence Beer Garden 100 S Independence Mall W, Philadelphia, PA 19106 NA NA
U-Bahn 1320 Chestnut St, Philadephia, PA 19107 NA NA
Name Address Latitude Longitude
City Tap House Logan Square 2 Logan Square, Philadelphia, PA 19103 NA NA
Con Murphy’s Irish Pub 1700 Ben Franklin Pkwy, Philadelphia, PA 19103 NA NA
The Wayward 1170 Ludlow St., Philadelphia, PA NA NA

Building the map

We can build the interactive map with the leaflet package

  • interactive
  • highly customizable
  • mobile-friendly

CSS tip

Use a css chunk in your Quarto (.qmd) or R Markdown (.Rmd) file to import the font face(s) and weights you want your map to use (e.g. Red Hat Text)

@import url('https://fonts.googleapis.com/css2?family=Red+Hat+Text:ital,wght@0,300;0,400;1,300;1,400&display=swap');

Plotting the restaurants/bars

leaflet(
  data = specials,
  options = tileOptions(minZoom = 15,
                        maxZoom = 19)) |> 
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude,
    popup = specials$Address,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    )

Adding the map background

leaflet(
  data = specials, 
  options = tileOptions(minZoom = 15,
                        maxZoom = 19)) |> 
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude, 
    popup = specials$Address,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron)

Setting the map view

leaflet(
  data = specials, 
  options = tileOptions(minZoom = 15,
                        maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude, 
    popup = specials$Address,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron) |>
  # set the map view
  setView(mean(specials$Longitude), 
          mean(specials$Latitude), 
          zoom = 16)

Adding fullscreen control

leaflet(
  data = specials,
  options = tileOptions(minZoom = 15,
                        maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude, 
    popup = specials$Address,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron) |>
  # set the map view
  setView(mean(specials$Longitude), 
          mean(specials$Latitude), 
          zoom = 16) |>
  # add fullscreen control button
  leaflet.extras::addFullscreenControl()

Customizing map markers: styling

# marker for the restaurants/bars
popInfoCircles <- paste(
  "<h2 style='font-family: Red Hat Text, sans-serif; font-size: 1.6em; color:#43464C;'>", 
  "<a style='color: #00857A;' href=", specials$Specials, ">", specials$Name, "</a></h2>",
  "<p style='font-family: Red Hat Text, sans-serif; font-weight: normal; font-size: 1.5em; color:#9197A6;'>", specials$Address, "</p>"
  )

In formatted HTML

<!--heading: restaurant/bar name-->
<h2 
style="font-family: Red Hat Text, sans-serif; font-size: 1.6em; color:#43464C;">
  <!--hyperlinks: link to the special-->
  <a style="color: #00857A;" 
  href="https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen">
  1028 Yamitsuki Sushi &amp; Ramen 
  </a>
</h2>

<!--paragraph: address-->
<p 
style="font-family: Red Hat Text, sans-serif; font-weight: normal; font-size: 1.5em; color:#9197A6;">
1028 Arch Street, Philadelphia, PA 19107  
</p>

Customizing map markers: map

leaflet(data = specials, 
        options = tileOptions(minZoom = 15,
                              maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude,
    # customize markers
    fillColor = "#009E91",
    fillOpacity = 0.6, 
    stroke = F,
    radius = 12,
    # customize pop-ups
    popup = popInfoCircles,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron) |>
  # set the map view
  setView(mean(specials$Longitude), 
          mean(specials$Latitude), 
          zoom = 16) |> 
  # add fullscreen control button
  leaflet.extras::addFullscreenControl()

Adding a marker at the center: styling

# marker for the center of the map
popInfoMarker <- paste(
  "<h1 style='padding-top: 0.5em; margin-top: 1em; margin-bottom: 0.5em; font-family: Red Hat Text, sans-serif; font-size: 1.8em; color:#43464C;'>", 
  "<a style='color: #00857A;' href='https://centercityphila.org/explore-center-city/ccdsips'>",
  "Center City District Sips 2022", "</a>",
  "</h1>",
  "<p style='color:#9197A6; font-family: Red Hat Text, sans-serif; font-size: 1.5em; padding-bottom: 1em;'>", 
  "Philadelphia, PA", "</p>"
  )

Making an awesome icon for the center

# custom icon for the center of the map
centerIcon <-
  makeAwesomeIcon(
    icon = "map-pin",
    iconColor = "#FFFFFF",
    markerColor = "darkblue", # accepts HTML colors
    library = "fa"
  )

Adding a marker at the center: map

leaflet(data = specials, 
        options = tileOptions(minZoom = 15,
                              maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude, 
    # customize markers
    fillColor = "#009E91",
    fillOpacity = 0.6, 
    stroke = F,
    radius = 12,
    # customize pop-ups
    popup = popInfoCircles,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron) |>
  # set the map view
  setView(mean(specials$Longitude), 
          mean(specials$Latitude), 
          zoom = 16) |> 
  # add fullscreen control button
  leaflet.extras::addFullscreenControl() |> 
  # add marker at the center
  addAwesomeMarkers(
    icon = centerIcon,
    lng = mean(specials$Longitude), 
    lat = mean(specials$Latitude), 
    label = "Center City District Sips 2022",
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      ),
    popup = popInfoMarker,
    popupOptions = popupOptions(maxWidth = 250))

Estimated map use

Nearly 3200 views from 2300 U.S. visitors! 🤯

Analytics dashboard showing the time period of Center City Sips 2022, June 1, 2022 to August 31. Data is filtered down to show only visits to the blog post hosting the Center Sips map, and to visitors from the US. There were close to 3200 views from 2300 visitors. Most visits were in June, followed by July, and finally August.

Umami dashboard for silviacanelon.com