rdflib - package review

Reviewer: @annakrystalli

Review Submitted: 2018-01-30




This report contains documents the review of rOpenSci submitted package:

rdflib: ropensci/onboarding issue #169).


Package info

Description:

The Resource Description Framework, or ‘RDF’ is a widely used data representation model that forms the cornerstone of the Semantic Web. ‘RDF’ represents data as a graph rather than the familiar data table or rectangle of relational databases. The ‘rdflib’ package provides a friendly and concise user interface for performing common tasks on ‘RDF’ data, such as reading, writing and converting between the various serializations of ‘RDF’ data, including ‘rdfxml’, ‘turtle’, ‘nquads’, ‘ntriples’, ‘trig’, and ‘json-ld’; creating new ‘RDF’ graphs, and performing graph queries using ‘SPARQL’. This package wraps the low level ‘redland’ R package which provides direct bindings to the ‘redland’ C library. Additionally, the package supports the newer and more developer friendly ‘JSON-LD’ format through the ‘jsonld’ package. The package interface takes inspiration from the Python ‘rdflib’ library.

Author: Carl Boettiger cboettig@gmail.com [aut, cre, cph] (0000-0002-1642-628X)

repo url: https://github.com/cboettig/rdflib

website url: https://cboettig.github.io/rdflib/

Review info

See reviewer guidelines for further information on the rOpenSci review process.

key review checks:

  • Does the code comply with general principles in the Mozilla reviewing guide?
  • Does the package comply with the ROpenSci packaging guide?
  • Are there improvements that could be made to the code style?
  • Is there code duplication in the package that should be reduced?
  • Are there user interface improvements that could be made?
  • Are there performance improvements that could be made?
  • Is the documentation (installation instructions/vignettes/examples/demos) clear and sufficient?

Please be respectful and kind to the authors in your reviews. The rOpenSci code of conduct is mandatory for everyone involved in our review process.


session info

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] jsonlite_1.5   SPARQL_1.16    RCurl_1.95-4.8 bitops_1.0-6   XML_3.98-1.9   rdflib_0.0.3  
[7] magrittr_1.5  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15       xmlparsedata_1.0.1 compiler_3.4.3     pillar_1.1.0       base64enc_0.1-3   
 [6] remotes_1.1.0      tools_3.4.3        digest_0.6.14      praise_1.0.0       memoise_1.1.0     
[11] evaluate_0.10.1    tibble_1.4.2       pkgconfig_2.0.1    rlang_0.1.6        rex_1.1.2         
[16] whoami_1.1.2       rstudioapi_0.7     commonmark_1.4     curl_3.1           yaml_2.1.16       
[21] cyclocomp_1.1.0    roxygen2_6.0.1     stringr_1.2.0      pkgreviewr_0.1.0   withr_2.1.1.9000  
[26] jqr_1.0.0          httr_1.3.1         knitr_1.18         xml2_1.2.0         desc_1.1.1        
[31] devtools_1.13.4    hms_0.4.0          redland_1.0.17-9   jsonld_1.2         rprojroot_1.3-2   
[36] R6_2.2.2           rcmdcheck_1.2.1    rmarkdown_1.8      callr_1.0.0        readr_1.1.1       
[41] lintr_1.0.2        covr_3.0.1         backports_1.1.2    clisymbols_1.2.0   htmltools_0.3.6   
[46] rsconnect_0.8.5    assertthat_0.2.0   goodpractice_1.0.0 V8_1.5             stringi_1.1.6     
[51] lazyeval_0.2.1     crayon_1.3.4      

Test installation

test install of rdflib from GitHub with:

devtools::install_github("cboettig/rdflib", force = T, dependencies = T)
Downloading GitHub repo cboettig/rdflib@master
from URL https://api.github.com/repos/cboettig/rdflib/zipball/master
Installing rdflib
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL  \
  '/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/Rtmptt5hxc/devtools671d75f53311/cboettig-rdflib-2c150e2'  \
  --library='/Users/Anna/Library/R/3.4/library' --install-tests 

* installing *source* package ‘rdflib’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (rdflib)
remove.packages("rdflib")
Removing package from ‘/Users/Anna/Library/R/3.4/library’
(as ‘lib’ is unspecified)

comments:


test local rdflib install:

pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T, build_vignettes = T)
Installing rdflib
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD build '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib'  \
  --no-resave-data --no-manual 
* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 21-38 (rdflib.Rmd) 
Error: processing vignette 'rdflib.Rmd' failed with diagnostics:
there is no package called 'jqr'
Execution halted
Error: Command failed (1)
pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T)
Installing rdflib
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib'  \
  --library='/Users/Anna/Library/R/3.4/library' --install-tests 

* installing *source* package ‘rdflib’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (rdflib)
devtools::install(pkg_dir, dependencies = T, build_vignettes = T)
Installing rdflib
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD build '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib'  \
  --no-resave-data --no-manual 
* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘rdflib_0.0.3.tar.gz’

'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL  \
  '/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/Rtmptt5hxc/rdflib_0.0.3.tar.gz'  \
  --library='/Users/Anna/Library/R/3.4/library' --install-tests 

* installing *source* package ‘rdflib’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (rdflib)

comments:

when I first ran devtools::install(pkg_dir, dependencies = T, build_vignettes = T), the building of the vignettes threw an error because suggests package ‘jqr’ had not been installed yet?

* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 21-38 (rdflib.Rmd) 
Error: processing vignette 'rdflib.Rmd' failed with diagnostics:
there is no package called 'jqr'
Execution halted

Check package integrity

run checks on rdflib source:

devtools::check(pkg_dir)
Updating rdflib documentation
Loading rdflib
Setting env vars -------------------------------------------------------------
CFLAGS  : -Wall -pedantic
CXXFLAGS: -Wall -pedantic
Building rdflib --------------------------------------------------------------
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
  --no-environ --no-save --no-restore --quiet CMD build  \
  '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib' --no-resave-data  \
  --no-manual 
* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘rdflib_0.0.3.tar.gz’

Setting env vars -------------------------------------------------------------
_R_CHECK_CRAN_INCOMING_ : FALSE
_R_CHECK_FORCE_SUGGESTS_: FALSE
Checking rdflib --------------------------------------------------------------
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
  --no-environ --no-save --no-restore --quiet CMD check  \
  '/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T//Rtmp5P7YC0/rdflib_0.0.3.tar.gz'  \
  --as-cran --timings --no-manual 
* using log directory ‘/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/Rtmp5P7YC0/rdflib.Rcheck’
* using R version 3.4.3 (2017-11-30)
* using platform: x86_64-apple-darwin15.6.0 (64-bit)
* using session charset: UTF-8
* using options ‘--no-manual --as-cran’
* checking for file ‘rdflib/DESCRIPTION’ ... OK
* this is package ‘rdflib’ version ‘0.0.3’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘rdflib’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘testthat.R’
 OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking re-building of vignette outputs ... OK
* DONE
Status: OK

R CMD check results
0 errors | 0 warnings | 0 notes

comments:


run tests on rdflib source:

devtools::test(pkg_dir)
Loading rdflib
Loading required package: testthat

Attaching package: ‘testthat’

The following objects are masked from ‘package:magrittr’:

    equals, is_less_than, not

The following object is masked from ‘package:devtools’:

    setup

Testing rdflib
✔ | OK F W S | Context

⠏ |  0       | test-rdf.R
⠋ |  1       | test-rdf.R
⠙ |  2       | test-rdf.R
⠹ |  3       | test-rdf.R
⠸ |  4       | test-rdf.R
⠼ |  5       | test-rdf.R
⠴ |  6       | test-rdf.R
⠦ |  7       | test-rdf.R
⠧ |  8       | test-rdf.R
⠇ |  9       | test-rdf.R
⠏ | 10       | test-rdf.R
trying URL 'https://tinyurl.com/ycf95c9h'
Content type 'text/plain; charset=utf-8' length 390 bytes
==================================================
downloaded 390 bytes

⠋ | 11       | test-rdf.R
✔ | 11       | test-rdf.R [1.1 s]

══ Results ════════════════════════════════════════════════════════════════════
Duration: 1.2 s

OK:       11
Failed:   0
Warnings: 0
Skipped:  0

YEE-HAW - priceless code.

comments:


check rdflib for goodpractice:

goodpractice::gp(pkg_dir)
Preparing: covr
Preparing: cyclocomp
* installing *source* package ‘rdflib’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (rdflib)
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck

♥ Ole! Dandy package! Keep up the first-rate work!

comments:


Check package metadata files

spell check

devtools::spell_check(pkg_dir)
  WORD              FOUND IN
Boettiger         rdflib-package.Rd:43
browseVignettes   rdflib-package.Rd:32
json              description:8
JSON              rdflib-package.Rd:20,23, description:13
jsonld            rdf_parse.Rd:15, rdf_serialize.Rd:17, description:13
ld                description:8
LD                rdflib-package.Rd:20,23, description:13
nquads            rdf_parse.Rd:14, rdf_serialize.Rd:16, description:8
ntriples          rdf_parse.Rd:14, rdf_serialize.Rd:16, description:8
occured           rdf_serialize.Rd:27
rdf               rdf_add.Rd:5,11,26,29,32, rdf_parse.Rd:11,13,20, rdf_query.Rd:10, rdf_serialize.Rd:11,15, rdf.Rd:5,10,13
RDF               rdf_parse.Rd:5,24, rdf_serialize.Rd:5,30, rdflib-package.Rd:9,11,18,19,20,23, description:1,3,6,7,9
rdfxml            rdf_parse.Rd:14, rdf_serialize.Rd:16, description:8
redland           rdf_parse.Rd:20, rdf_query.Rd:14, description:10,11
SPARQL            rdf_query.Rd:5,12,20, rdflib-package.Rd:19,23, description:9
uri               rdf_add.Rd:19,21
URI               rdf_add.Rd:23

comments:


Check documentation

online documentation: https://cboettig.github.io/rdflib/

  • Is the documentation (installation instructions/vignettes/examples/demos) clear and sufficient?

test rdflib function help files:

help(package = "rdflib")

comments:


test rdflib vignettes:

vignette(package = "rdflib")

comments:

Online documentation:

My main suggestion is to try to define some terms and improve the concept map for the tools by adding some detail and broader context to the documentation. The following suggestions could also be address with links to further details if you think they are too superfluous for explicit documentation with the package.

  • a brief intro to the semantic could be useful (eg something like):

The semantic web aims to link data in a machine readable way through the web, making data more alignable and interoperable, much easier to search, enriching and compute on.

  • what a graph format for data is (eg triples etc).

  • the structure of an rdf S3 object (ie you introduced some aspects of the data format here: (user does not have to manage world, model and storage objects by default just to perform standard operations and conversions) which we are told we can ignore (which is great) but actually creates more questions… what is this mysterious “world” object that forms an opaque slot of an rdf S3 object?) Would be nice to explain the structure of the S3 rdf briefly. Is there usefull metadata that can be extracted from the structure? (see comment later)

  • rdf file formats. I think its would especially aid in appreciating the rdf_serialise function to expand briefly (and potentially signpost to a resource like this) on the various serialization formats, perhaps even why one would use one over another, and particularly, why serialization involves writing a file out. I feel these are important concepts to help appreciate use cases of the function. Indeed the file out aspect of the function could do with being flagged more prominently in function man page where just by looking at the (somewhat jargony if you don’t know what serialization is) description and running the example, you’ve ended up writing a file without realising.

Similarly, parsing can then be seen/described as reading in an rdf from their specific formats.

Spelling a few things out in plain english could really help folks follow what’s going better and understand what file types are inputs or outputs of different functions.

how do I find info on URIs?

In general what is missing for me is some signposting/guidance on how I can find information on the semantics dictating what information I can extract from an rdf object. eg. with a df or list you could use str to get an idea of how you could start indexing these objects. If confronted with a local rdf file, how would one go about figuring out even what they can query? I appreciate this is really one of the difficulties of working with rdf and semantic data in general but I feel some brief guidance or demo on how one would approach this would go a long way.

examples in general

For clarity to the reader who may not have looked at function documentation yet, I recommend using the full argument names when supplying arguments to functions (if not always atleast the first time an argument is introduced) in vignettes.

SPARQL queries to JSON data section

At the end of the intro to the section, you write:

Here is a query that for all papers where I am an author, returns a table of given name, family name and year of publication:

Am I right in thinking though that you are co-author on all papers in the rdf but the query is in fact filtering the names of your co-authors? (through FILTER ( ?coi_family != "Boettiger" ))

Turning RDF-XML into more friendly JSON

It would be nice if possible to see sample of print outs of the covenrsion of the different files or at least of the effect of compaction.

rdf_add man page

Would be nice to see a demo of using one or more of the additonal arguments.

Motivating example

I think an additonal, more detailed motivating example might illustrate more direct use case in a researchers workflow. In particular it would be good to highlight the great potential of triplestore APIs (and celebrate the efforts of many cool eg governmental linked data initiatives). So an example that incorporates a query to a triplestore and then enrichment of a researchers data could be a cool example. This could be a longer term project opr even just an rOpenSci blogpost but see comment re: rdf_query function below.


Test functionality:

  • Are there user interface improvements that could be made?
  • Are there performance improvements that could be made?
library("rdflib")
library(SPARQL)
library(jsonlite)
exports <-ls("package:rdflib")
exports
[1] "rdf"           "rdf_add"       "rdf_parse"     "rdf_query"    
[5] "rdf_serialize"

rdf_serialize

doc <- system.file("extdata", "dc.rdf", package="redland")

doc %>%
  rdf_parse() %>%
  rdf_serialize(doc = "test.nquads", format = "nquads")

doc %>%
  rdf_parse() %>%
  rdf_serialize(doc = "test.rdfxml", format = "rdfxml")


doc %>%
  rdf_parse() %>%
  rdf_serialize(doc = "test.ntriples", format = "ntriples")
readr::read_file("test.nquads")
[1] "<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/description> \"The generic home page of Dave Beckett.\" .\n<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/title> \"Dave Beckett's Home Page\" .\n<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/creator> \"Dave Beckett\" .\n"
readr::read_file("test.ntriples")
[1] "<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/description> \"The generic home page of Dave Beckett.\" .\n<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/title> \"Dave Beckett's Home Page\" .\n<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/creator> \"Dave Beckett\" .\n"
readr::read_file("test.rdfxml")
[1] "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n  <rdf:Description rdf:about=\"http://www.dajobe.org/\">\n    <ns0:description xmlns:ns0=\"http://purl.org/dc/elements/1.1/\">The generic home page of Dave Beckett.</ns0:description>\n  </rdf:Description>\n  <rdf:Description rdf:about=\"http://www.dajobe.org/\">\n    <ns0:title xmlns:ns0=\"http://purl.org/dc/elements/1.1/\">Dave Beckett's Home Page</ns0:title>\n  </rdf:Description>\n  <rdf:Description rdf:about=\"http://www.dajobe.org/\">\n    <ns0:creator xmlns:ns0=\"http://purl.org/dc/elements/1.1/\">Dave Beckett</ns0:creator>\n  </rdf:Description>\n</rdf:RDF>\n"
serialisation errors: turtle & trig
library(magrittr)
library(rdflib)
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
  rdf_parse() %>%
  rdf_serialize(doc = "test.turtle", format = "turtle")
librdf error - serializer 'turtle' not found
rdf_serializer.c:597: (librdf_serializer_serialize_model_to_file) assertion failed: object pointer of type librdf_serializer is NULL.
library(magrittr)
library(rdflib)
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
  rdf_parse() %>%
  rdf_serialize(doc = "test.trig", format = "trig")
librdf error - serializer 'trig' not found
rdf_serializer.c:597: (librdf_serializer_serialize_model_to_file) assertion failed: object pointer of type librdf_serializer is NULL.

rdf_serialize

doc %>%
  rdf_parse(type = "jsold")
<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/description> "The generic home page of Dave Beckett." .
<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/title> "Dave Beckett's Home Page" .
<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/creator> "Dave Beckett" .
doc %>%
  rdf_parse(type = "jsonld") %>% class
[1] "rdf"

https://cboettig.github.io/rdflib/articles/rdflib.html

ex <- system.file("extdata/vita.json", package="rdflib")
vita <- rdf_parse(ex, "jsonld")
sparql <-
 'PREFIX schema: <http://schema.org/>
  SELECT ?coi_given ?coi_family ?year
  WHERE { 
    ?paper a schema:ScholarlyArticle . 
    ?paper schema:author ?authors .
    ?paper schema:dateCreated ?year . 
    ?authors schema:familyName ?coi_family .
    OPTIONAL { ?authors schema:givenName ?coi_given . }
    FILTER ( ?coi_family != "Boettiger" )
}
'
vita %>% rdf_query(sparql)
sparql <-
 'PREFIX schema: <http://schema.org/>
  SELECT ?coi_given ?coi_family ?year
  WHERE { 
    ?paper a schema:ScholarlyArticle . 
    ?paper schema:author ?authors .
    ?paper schema:dateCreated ?year . 
    ?authors schema:familyName ?coi_family .
    OPTIONAL { ?authors schema:givenName ?coi_given . }
    FILTER ( ?year == "2009-10-19"^^xs:date)
}
'
vita %>% rdf_query(sparql)
librdf error  - syntax error, unexpected EQ
rdf_query_results.c:100: (librdf_query_results_finished) assertion failed: object pointer of type librdf_query_results is NULL.
rdf_query_results.c:100: (librdf_query_results_finished) assertion failed: object pointer of type librdf_query_results is NULL.

I’m finding it difficult to query by date. I realise you’ve opted to do date filtering later but, it could be another quite informative additional example as temporal filtering is probably a widespread usecase.

sparql <-
 'PREFIX schema: <http://schema.org/>

  SELECT ?coi_given ?coi_family ?year

  WHERE { 
    ?paper a schema:ScholarlyArticle . 
    ?paper schema:author ?authors .
    ?authors schema:familyName ?coi_family .
    OPTIONAL { ?authors schema:givenName ?coi_given . }

    FILTER ( ?coi_family != "Boettiger" )
}
'

vita %>% rdf_query(sparql)

Nice warning message

vita %>% str
List of 2
 $ world:Formal class 'World' [package "redland"] with 1 slot
  .. ..@ librdf_world:Formal class '_p_librdf_world_s' [package "redland"] with 1 slot
  .. .. .. ..@ ref:Formal class 'externalptr' [package ""] with 0 slots
 list()
 $ model:Formal class 'Model' [package "redland"] with 1 slot
  .. ..@ librdf_model:Formal class '_p_librdf_model_s' [package "redland"] with 1 slot
  .. .. .. ..@ ref:Formal class 'externalptr' [package ""] with 0 slots
 list()
 - attr(*, "class")= chr "rdf"
vita %>% class
[1] "rdf"
vita$world@librdf_world
An object of class "_p_librdf_world_s"
Slot "ref":
<pointer: 0x101a93760>
vita$model@librdf_model
An object of class "_p_librdf_model_s"
Slot "ref":
<pointer: 0x10dc174f0>
doc <- system.file("extdata", "dc.rdf", package="redland")

sparql <-
'PREFIX dc: <http://purl.org/dc/elements/1.1/>
 SELECT ?author ?c
 WHERE { ?author dc:creator ?c . }'

rdf <- rdf_parse(doc)
rdf_query(rdf, sparql)
jq
  readr::read_file(ex) %>%
  jqr::jq(
     '."@reverse".author[]  | 
       { year: .dateCreated, 
         author: .author[] | [.givenName, .familyName]  | join(" ")
       }') %>%
  jqr::combine() %>%
  jsonlite::fromJSON()

Select the first 15 triplets.

library(SPARQL)
endpoint <- "http://linked.bodc.ac.uk/sparql/"
query <- 'select * where {?s ?p ?o . } limit 15'
out <- SPARQL(endpoint, query)
query <- 'select * where {?subject ?predicate ?object . } limit 15'
res <- SPARQL(endpoint, query)
res %>% class
[1] "list"
xml
$results

$namespaces
NULL
"http://www.w3.org/ns/dcat#Dataset"
[1] "http://www.w3.org/ns/dcat#Dataset"
query <- 'select ?dataset where {?dataset a <http://www.w3.org/ns/dcat#Dataset> . } limit 15'
query <- 'SELECT DISTINCT ?s
WHERE {
  ?s a <http://www.w3.org/ns/dcat#Dataset>
} limit 15'
SPARQL(endpoint, query)
$results

$namespaces
NULL
res$results %>% toJSON() %>% class() 
[1] "json"
SPARQL::commonns
 [1] "xsd"                                          
 [2] "<http://www.w3.org/2001/XMLSchema#>"          
 [3] "rdf"                                          
 [4] "<http://www.w3.org/1999/02/22-rdf-syntax-ns#>"
 [5] "rdfs"                                         
 [6] "<http://www.w3.org/2000/01/rdf-schema#>"      
 [7] "owl"                                          
 [8] "<http://www.w3.org/2002/07/owl#>"             
 [9] "skos"                                         
[10] "<http://www.w3.org/2004/02/skos/core#>"       
[11] "dc"                                           
[12] "<http://purl.org/dc/elements/1.1/>"           
[13] "foaf"                                         
[14] "<http://xmlns.com/foaf/0.1/>"                 
[15] "wgs84"                                        
[16] "<http://www.w3.org/2003/01/geo/wgs84_pos#>"   
[17] "qb"                                           
[18] "<http://purl.org/linked-data/cube#>"          
ex <- system.file("extdata/vita.json", package="rdflib")
vita <- rdf_parse(ex, "json")
sparql <-
 'PREFIX schema: <http://schema.org/>
  SELECT ?coi_given ?coi_family ?year
  WHERE { 
    ?paper a schema:ScholarlyArticle . 
    ?paper schema:author ?authors .
    ?paper schema:dateCreated ?year . 
    ?authors schema:familyName ?coi_family .
    OPTIONAL { ?authors schema:givenName ?coi_given . }
    FILTER ( ?coi_family != "Boettiger" )
}
'
vita %>% rdf_query(sparql)

comments:

I know this is more the redland package and the whole point is I’d kinda like to know what world is. Despite this reference in the vignette: > user does not have to manage world, model and storage objects by default just to perform standard operations and conversions)

I’m not really sure what they are. Would be nice to explain what the structure of the S3 rdf is. Is there usefull metadata that can be extracted from the structure?

  • in rdf_serialize documentation: I think a brief mention of what serialisation actually is would really help understand this function. I also feel it should be flagged more prominently that the function writes a file out.

Inspect code:

pkgreviewr::pkgreview_print_source("rdflib")
## rdf
function () 
{
    world <- new("World")
    storage <- new("Storage", world, "hashes", name = "", options = "hash-type='memory'")
    model <- new("Model", world = world, storage, options = "")
    structure(list(world = world, model = model), class = "rdf")
}
<environment: namespace:rdflib>
--- 
 
## rdf_add
function (x, subject, predicate, object, subjectType = as.character(NA), 
    objectType = as.character(NA), datatype_uri = as.character(NA)) 
{
    stmt <- new("Statement", world = x$world, subject, predicate, 
        object, subjectType, objectType, datatype_uri)
    addStatement(x$model, stmt)
    invisible(x)
}
<environment: namespace:rdflib>
--- 
 
## rdf_parse
function (doc, format = c("rdfxml", "nquads", "ntriples", "trig", 
    "turtle", "jsonld"), ...) 
{
    format <- match.arg(format)
    doc <- text_or_url_to_doc(doc)
    if (format == "jsonld") {
        tmp <- tempfile()
        tmp <- add_base_uri(doc, tmp)
        rdf <- jsonld::jsonld_to_rdf(tmp)
        writeLines(rdf, tmp)
        format <- "nquads"
        doc <- tmp
    }
    x <- rdf()
    mimetype <- unname(rdf_mimetypes[format])
    parser <- new("Parser", x$world, name = format, mimeType = mimetype)
    redland::parseFileIntoModel(parser, x$world, doc, x$model)
    x
}
<environment: namespace:rdflib>
--- 
 
## rdf_query
function (x, query, ...) 
{
    queryObj <- new("Query", x$world, query, ...)
    queryResult <- redland::executeQuery(queryObj, x$model)
    out <- list()
    result <- redland::getNextResult(queryResult)
    out <- c(out, result)
    while (!is.null(result)) {
        result <- redland::getNextResult(queryResult)
        out <- c(out, result)
    }
    redland::freeQueryResults(queryResult)
    redland::freeQuery(queryObj)
    rectangularize_query_results(out)
}
<environment: namespace:rdflib>
--- 
 
## rdf_serialize
function (x, doc, format = c("rdfxml", "nquads", "ntriples", 
    "trig", "turtle", "jsonld"), namespace = NULL, prefix = NULL, 
    ...) 
{
    format <- match.arg(format)
    jsonld_output <- format == "jsonld"
    if (jsonld_output) {
        format <- "nquads"
    }
    mimetype <- rdf_mimetypes[format]
    serializer <- new("Serializer", x$world, name = format, mimeType = mimetype)
    if (!is.null(namespace)) {
        redland::setNameSpace(serializer, x$world, namespace = namespace, 
            prefix = prefix)
    }
    status <- redland::serializeToFile(serializer, x$world, x$model, 
        doc)
    if (jsonld_output) {
        txt <- paste(readLines(doc), collapse = "\n")
        if (length(txt) > 0) {
            json <- jsonld::jsonld_from_rdf(txt)
            writeLines(json, doc)
        }
    }
    invisible(status)
}
<environment: namespace:rdflib>
--- 
 
$rdf
NULL

$rdf_add
NULL

$rdf_parse
NULL

$rdf_query
NULL

$rdf_serialize
NULL
redland::serializeToFile
nonstandardGenericFunction for "serializeToFile" defined from package "redland"

function (.Object, world, model, filePath, ...) 
{
    standardGeneric("serializeToFile")
}
<environment: 0x10677db20>
Methods may be defined for arguments: .Object, world, model, filePath
Use  showMethods("serializeToFile")  for currently available ones.
rdflib:::rdf_mimetypes
                 nquads                ntriples                  rdfxml                    trig 
        "text/x-nquads" "application/n-triples"   "application/rdf+xml"    "application/x-trig" 
                 turtle 
   "application/turtle" 
rdflib:::rectangularize_query_results
function (out) 
{
    vars <- unique(names(out))
    X <- lapply(vars, function(v) gsub("\"(([^\\^])+)\"\\^*.*", 
        "\\1", as.character(out[names(out) == v])))
    names(X) <- vars
    as.data.frame(X, stringsAsFactors = FALSE)
}
<environment: namespace:rdflib>

Is there a way to return a non regularised query result ie return an rdf instead?

I’m thinking about a usecase when maybe it’s better to enrich data by merging rdfs?

Ie, researcher queries a triples store through an API (yeyyy open data!), combines their not fully matching but interoperable rdf data with rdf_add (ie try to show how triplestore is better than tabular non-linked data for merging) and then queries the merged rdf to extract an enrched analytical tabular dataset?



comments:

Review test suite:

test coverage

covr::package_coverage(pkg_dir)
rdflib Coverage: 100.00%
R/rdf.R: 100.00%

inspect tests

comments:

Add tests for being able to serialise to trig and turtles which at the moment is throwing an error.

Perhaps a test for parsing/serialising each format could be good.


