rdflib
- package reviewThis report contains documents the review of rOpenSci submitted package:
rdflib
: ropensci/onboarding issue #169).Description:
The Resource Description Framework, or ‘RDF’ is a widely used data representation model that forms the cornerstone of the Semantic Web. ‘RDF’ represents data as a graph rather than the familiar data table or rectangle of relational databases. The ‘rdflib’ package provides a friendly and concise user interface for performing common tasks on ‘RDF’ data, such as reading, writing and converting between the various serializations of ‘RDF’ data, including ‘rdfxml’, ‘turtle’, ‘nquads’, ‘ntriples’, ‘trig’, and ‘json-ld’; creating new ‘RDF’ graphs, and performing graph queries using ‘SPARQL’. This package wraps the low level ‘redland’ R package which provides direct bindings to the ‘redland’ C library. Additionally, the package supports the newer and more developer friendly ‘JSON-LD’ format through the ‘jsonld’ package. The package interface takes inspiration from the Python ‘rdflib’ library.
Author: Carl Boettiger cboettig@gmail.com [aut, cre, cph] (0000-0002-1642-628X)
repo url: https://github.com/cboettig/rdflib
website url: https://cboettig.github.io/rdflib/
key review checks:
Please be respectful and kind to the authors in your reviews. The rOpenSci code of conduct is mandatory for everyone involved in our review process.
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] jsonlite_1.5 SPARQL_1.16 RCurl_1.95-4.8 bitops_1.0-6 XML_3.98-1.9 rdflib_0.0.3
[7] magrittr_1.5
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 xmlparsedata_1.0.1 compiler_3.4.3 pillar_1.1.0 base64enc_0.1-3
[6] remotes_1.1.0 tools_3.4.3 digest_0.6.14 praise_1.0.0 memoise_1.1.0
[11] evaluate_0.10.1 tibble_1.4.2 pkgconfig_2.0.1 rlang_0.1.6 rex_1.1.2
[16] whoami_1.1.2 rstudioapi_0.7 commonmark_1.4 curl_3.1 yaml_2.1.16
[21] cyclocomp_1.1.0 roxygen2_6.0.1 stringr_1.2.0 pkgreviewr_0.1.0 withr_2.1.1.9000
[26] jqr_1.0.0 httr_1.3.1 knitr_1.18 xml2_1.2.0 desc_1.1.1
[31] devtools_1.13.4 hms_0.4.0 redland_1.0.17-9 jsonld_1.2 rprojroot_1.3-2
[36] R6_2.2.2 rcmdcheck_1.2.1 rmarkdown_1.8 callr_1.0.0 readr_1.1.1
[41] lintr_1.0.2 covr_3.0.1 backports_1.1.2 clisymbols_1.2.0 htmltools_0.3.6
[46] rsconnect_0.8.5 assertthat_0.2.0 goodpractice_1.0.0 V8_1.5 stringi_1.1.6
[51] lazyeval_0.2.1 crayon_1.3.4
rdflib
from GitHub with:devtools::install_github("cboettig/rdflib", force = T, dependencies = T)
Downloading GitHub repo cboettig/rdflib@master
from URL https://api.github.com/repos/cboettig/rdflib/zipball/master
Installing rdflib
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save \
--no-restore --quiet CMD INSTALL \
'/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/Rtmptt5hxc/devtools671d75f53311/cboettig-rdflib-2c150e2' \
--library='/Users/Anna/Library/R/3.4/library' --install-tests
* installing *source* package ‘rdflib’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (rdflib)
remove.packages("rdflib")
Removing package from ‘/Users/Anna/Library/R/3.4/library’
(as ‘lib’ is unspecified)
rdflib
install:pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T, build_vignettes = T)
Installing rdflib
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save \
--no-restore --quiet CMD build '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib' \
--no-resave-data --no-manual
* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 21-38 (rdflib.Rmd)
Error: processing vignette 'rdflib.Rmd' failed with diagnostics:
there is no package called 'jqr'
Execution halted
Error: Command failed (1)
pkg_dir <- "/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib-review/../rdflib"
devtools::install(pkg_dir, dependencies = T)
Installing rdflib
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save \
--no-restore --quiet CMD INSTALL '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib' \
--library='/Users/Anna/Library/R/3.4/library' --install-tests
* installing *source* package ‘rdflib’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (rdflib)
devtools::install(pkg_dir, dependencies = T, build_vignettes = T)
Installing rdflib
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save \
--no-restore --quiet CMD build '/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib' \
--no-resave-data --no-manual
* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘rdflib_0.0.3.tar.gz’
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file --no-environ --no-save \
--no-restore --quiet CMD INSTALL \
'/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/Rtmptt5hxc/rdflib_0.0.3.tar.gz' \
--library='/Users/Anna/Library/R/3.4/library' --install-tests
* installing *source* package ‘rdflib’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (rdflib)
when I first ran devtools::install(pkg_dir, dependencies = T, build_vignettes = T)
, the building of the vignettes threw an error because suggests package ‘jqr’
had not been installed yet?
* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
Quitting from lines 21-38 (rdflib.Rmd)
Error: processing vignette 'rdflib.Rmd' failed with diagnostics:
there is no package called 'jqr'
Execution halted
rdflib
source:devtools::check(pkg_dir)
Updating rdflib documentation
Loading rdflib
Setting env vars -------------------------------------------------------------
CFLAGS : -Wall -pedantic
CXXFLAGS: -Wall -pedantic
Building rdflib --------------------------------------------------------------
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file \
--no-environ --no-save --no-restore --quiet CMD build \
'/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib' --no-resave-data \
--no-manual
* checking for file ‘/Users/Anna/Documents/workflows/rOpenSci/reviews/rdflib/DESCRIPTION’ ... OK
* preparing ‘rdflib’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘rdflib_0.0.3.tar.gz’
Setting env vars -------------------------------------------------------------
_R_CHECK_CRAN_INCOMING_ : FALSE
_R_CHECK_FORCE_SUGGESTS_: FALSE
Checking rdflib --------------------------------------------------------------
'/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file \
--no-environ --no-save --no-restore --quiet CMD check \
'/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T//Rtmp5P7YC0/rdflib_0.0.3.tar.gz' \
--as-cran --timings --no-manual
* using log directory ‘/private/var/folders/8p/87cqdx2s34vfvcgh04l6z72w0000gn/T/Rtmp5P7YC0/rdflib.Rcheck’
* using R version 3.4.3 (2017-11-30)
* using platform: x86_64-apple-darwin15.6.0 (64-bit)
* using session charset: UTF-8
* using options ‘--no-manual --as-cran’
* checking for file ‘rdflib/DESCRIPTION’ ... OK
* this is package ‘rdflib’ version ‘0.0.3’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘rdflib’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
Running ‘testthat.R’
OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking re-building of vignette outputs ... OK
* DONE
Status: OK
R CMD check results
0 errors | 0 warnings | 0 notes
rdflib
source:devtools::test(pkg_dir)
Loading rdflib
Loading required package: testthat
Attaching package: ‘testthat’
The following objects are masked from ‘package:magrittr’:
equals, is_less_than, not
The following object is masked from ‘package:devtools’:
setup
Testing rdflib
✔ | OK [31mF[39m [35mW[39m [34mS[39m | Context
⠏ | 0 | test-rdf.R
⠋ | 1 | test-rdf.R
⠙ | 2 | test-rdf.R
⠹ | 3 | test-rdf.R
⠸ | 4 | test-rdf.R
⠼ | 5 | test-rdf.R
⠴ | 6 | test-rdf.R
⠦ | 7 | test-rdf.R
⠧ | 8 | test-rdf.R
⠇ | 9 | test-rdf.R
⠏ | 10 | test-rdf.R
trying URL 'https://tinyurl.com/ycf95c9h'
Content type 'text/plain; charset=utf-8' length 390 bytes
==================================================
downloaded 390 bytes
⠋ | 11 | test-rdf.R
[32m✔[39m | 11 | test-rdf.R[36m [1.1 s][39m
══ [1mResults[22m ════════════════════════════════════════════════════════════════════
[36mDuration: 1.2 s[39m
OK: [32m11[39m
Failed: [32m0[39m
Warnings: [32m0[39m
Skipped: [32m0[39m
[32mYEE-HAW - priceless code.[39m
rdflib
for goodpractice:goodpractice::gp(pkg_dir)
Preparing: covr
Preparing: cyclocomp
* installing *source* package ‘rdflib’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (rdflib)
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
[1m[31m♥[39m[22m Ole! Dandy package! Keep up the first-rate work!
devtools::spell_check(pkg_dir)
WORD FOUND IN
Boettiger rdflib-package.Rd:43
browseVignettes rdflib-package.Rd:32
json description:8
JSON rdflib-package.Rd:20,23, description:13
jsonld rdf_parse.Rd:15, rdf_serialize.Rd:17, description:13
ld description:8
LD rdflib-package.Rd:20,23, description:13
nquads rdf_parse.Rd:14, rdf_serialize.Rd:16, description:8
ntriples rdf_parse.Rd:14, rdf_serialize.Rd:16, description:8
occured rdf_serialize.Rd:27
rdf rdf_add.Rd:5,11,26,29,32, rdf_parse.Rd:11,13,20, rdf_query.Rd:10, rdf_serialize.Rd:11,15, rdf.Rd:5,10,13
RDF rdf_parse.Rd:5,24, rdf_serialize.Rd:5,30, rdflib-package.Rd:9,11,18,19,20,23, description:1,3,6,7,9
rdfxml rdf_parse.Rd:14, rdf_serialize.Rd:16, description:8
redland rdf_parse.Rd:20, rdf_query.Rd:14, description:10,11
SPARQL rdf_query.Rd:5,12,20, rdflib-package.Rd:19,23, description:9
uri rdf_add.Rd:19,21
URI rdf_add.Rd:23
online documentation: https://cboettig.github.io/rdflib/
rdflib
function help files:help(package = "rdflib")
rdflib
vignettes:vignette(package = "rdflib")
My main suggestion is to try to define some terms and improve the concept map for the tools by adding some detail and broader context to the documentation. The following suggestions could also be address with links to further details if you think they are too superfluous for explicit documentation with the package.
The semantic web aims to link data in a machine readable way through the web, making data more alignable and interoperable, much easier to search, enriching and compute on.
what a graph format for data is (eg triples etc).
the structure of an rdf
S3 object (ie you introduced some aspects of the data format here: (user does not have to manage world, model and storage objects by default just to perform standard operations and conversions)
which we are told we can ignore (which is great) but actually creates more questions… what is this mysterious “world” object that forms an opaque slot of an rdf S3 object?) Would be nice to explain the structure of the S3 rdf briefly. Is there usefull metadata that can be extracted from the structure? (see comment later)
rdf
file formats. I think its would especially aid in appreciating the rdf_serialise
function to expand briefly (and potentially signpost to a resource like this) on the various serialization formats, perhaps even why one would use one over another, and particularly, why serialization involves writing a file out. I feel these are important concepts to help appreciate use cases of the function. Indeed the file out aspect of the function could do with being flagged more prominently in function man page where just by looking at the (somewhat jargony if you don’t know what serialization is) description and running the example, you’ve ended up writing a file without realising.
Similarly, parsing can then be seen/described as reading in an rdf
from their specific formats.
Spelling a few things out in plain english could really help folks follow what’s going better and understand what file types are inputs or outputs of different functions.
In general what is missing for me is some signposting/guidance on how I can find information on the semantics dictating what information I can extract from an rdf
object. eg. with a df
or list
you could use str
to get an idea of how you could start indexing these objects. If confronted with a local rdf
file, how would one go about figuring out even what they can query? I appreciate this is really one of the difficulties of working with rdf
and semantic data in general but I feel some brief guidance or demo on how one would approach this would go a long way.
For clarity to the reader who may not have looked at function documentation yet, I recommend using the full argument names when supplying arguments to functions (if not always atleast the first time an argument is introduced) in vignettes.
At the end of the intro to the section, you write:
Here is a query that for all papers where I am an author, returns a table of given name, family name and year of publication:
Am I right in thinking though that you are co-author on all papers in the rdf but the query is in fact filtering the names of your co-authors? (through FILTER ( ?coi_family != "Boettiger" )
)
It would be nice if possible to see sample of print outs of the covenrsion of the different files or at least of the effect of compaction.
rdf_add
man pageWould be nice to see a demo of using one or more of the additonal arguments.
I think an additonal, more detailed motivating example might illustrate more direct use case in a researchers workflow. In particular it would be good to highlight the great potential of triplestore APIs (and celebrate the efforts of many cool eg governmental linked data initiatives). So an example that incorporates a query to a triplestore and then enrichment of a researchers data could be a cool example. This could be a longer term project opr even just an rOpenSci blogpost but see comment re: rdf_query
function below.
library("rdflib")
library(SPARQL)
library(jsonlite)
exports <-ls("package:rdflib")
exports
[1] "rdf" "rdf_add" "rdf_parse" "rdf_query"
[5] "rdf_serialize"
rdf_serialize
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
rdf_parse() %>%
rdf_serialize(doc = "test.nquads", format = "nquads")
doc %>%
rdf_parse() %>%
rdf_serialize(doc = "test.rdfxml", format = "rdfxml")
doc %>%
rdf_parse() %>%
rdf_serialize(doc = "test.ntriples", format = "ntriples")
readr::read_file("test.nquads")
[1] "<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/description> \"The generic home page of Dave Beckett.\" .\n<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/title> \"Dave Beckett's Home Page\" .\n<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/creator> \"Dave Beckett\" .\n"
readr::read_file("test.ntriples")
[1] "<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/description> \"The generic home page of Dave Beckett.\" .\n<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/title> \"Dave Beckett's Home Page\" .\n<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/creator> \"Dave Beckett\" .\n"
readr::read_file("test.rdfxml")
[1] "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<rdf:RDF xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n <rdf:Description rdf:about=\"http://www.dajobe.org/\">\n <ns0:description xmlns:ns0=\"http://purl.org/dc/elements/1.1/\">The generic home page of Dave Beckett.</ns0:description>\n </rdf:Description>\n <rdf:Description rdf:about=\"http://www.dajobe.org/\">\n <ns0:title xmlns:ns0=\"http://purl.org/dc/elements/1.1/\">Dave Beckett's Home Page</ns0:title>\n </rdf:Description>\n <rdf:Description rdf:about=\"http://www.dajobe.org/\">\n <ns0:creator xmlns:ns0=\"http://purl.org/dc/elements/1.1/\">Dave Beckett</ns0:creator>\n </rdf:Description>\n</rdf:RDF>\n"
turtle
& trig
library(magrittr)
library(rdflib)
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
rdf_parse() %>%
rdf_serialize(doc = "test.turtle", format = "turtle")
librdf error - serializer 'turtle' not found
rdf_serializer.c:597: (librdf_serializer_serialize_model_to_file) assertion failed: object pointer of type librdf_serializer is NULL.
library(magrittr)
library(rdflib)
doc <- system.file("extdata", "dc.rdf", package="redland")
doc %>%
rdf_parse() %>%
rdf_serialize(doc = "test.trig", format = "trig")
librdf error - serializer 'trig' not found
rdf_serializer.c:597: (librdf_serializer_serialize_model_to_file) assertion failed: object pointer of type librdf_serializer is NULL.
rdf_serialize
doc %>%
rdf_parse(type = "jsold")
<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/description> "The generic home page of Dave Beckett." .
<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/title> "Dave Beckett's Home Page" .
<http://www.dajobe.org/> <http://purl.org/dc/elements/1.1/creator> "Dave Beckett" .
doc %>%
rdf_parse(type = "jsonld") %>% class
[1] "rdf"
ex <- system.file("extdata/vita.json", package="rdflib")
vita <- rdf_parse(ex, "jsonld")
sparql <-
'PREFIX schema: <http://schema.org/>
SELECT ?coi_given ?coi_family ?year
WHERE {
?paper a schema:ScholarlyArticle .
?paper schema:author ?authors .
?paper schema:dateCreated ?year .
?authors schema:familyName ?coi_family .
OPTIONAL { ?authors schema:givenName ?coi_given . }
FILTER ( ?coi_family != "Boettiger" )
}
'
vita %>% rdf_query(sparql)
sparql <-
'PREFIX schema: <http://schema.org/>
SELECT ?coi_given ?coi_family ?year
WHERE {
?paper a schema:ScholarlyArticle .
?paper schema:author ?authors .
?paper schema:dateCreated ?year .
?authors schema:familyName ?coi_family .
OPTIONAL { ?authors schema:givenName ?coi_given . }
FILTER ( ?year == "2009-10-19"^^xs:date)
}
'
vita %>% rdf_query(sparql)
librdf error - syntax error, unexpected EQ
rdf_query_results.c:100: (librdf_query_results_finished) assertion failed: object pointer of type librdf_query_results is NULL.
rdf_query_results.c:100: (librdf_query_results_finished) assertion failed: object pointer of type librdf_query_results is NULL.
I’m finding it difficult to query by date. I realise you’ve opted to do date filtering later but, it could be another quite informative additional example as temporal filtering is probably a widespread usecase.
sparql <-
'PREFIX schema: <http://schema.org/>
SELECT ?coi_given ?coi_family ?year
WHERE {
?paper a schema:ScholarlyArticle .
?paper schema:author ?authors .
?authors schema:familyName ?coi_family .
OPTIONAL { ?authors schema:givenName ?coi_given . }
FILTER ( ?coi_family != "Boettiger" )
}
'
vita %>% rdf_query(sparql)
Nice warning message
vita %>% str
List of 2
$ world:Formal class 'World' [package "redland"] with 1 slot
.. ..@ librdf_world:Formal class '_p_librdf_world_s' [package "redland"] with 1 slot
.. .. .. ..@ ref:Formal class 'externalptr' [package ""] with 0 slots
list()
$ model:Formal class 'Model' [package "redland"] with 1 slot
.. ..@ librdf_model:Formal class '_p_librdf_model_s' [package "redland"] with 1 slot
.. .. .. ..@ ref:Formal class 'externalptr' [package ""] with 0 slots
list()
- attr(*, "class")= chr "rdf"
vita %>% class
[1] "rdf"
vita$world@librdf_world
An object of class "_p_librdf_world_s"
Slot "ref":
<pointer: 0x101a93760>
vita$model@librdf_model
An object of class "_p_librdf_model_s"
Slot "ref":
<pointer: 0x10dc174f0>
doc <- system.file("extdata", "dc.rdf", package="redland")
sparql <-
'PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?author ?c
WHERE { ?author dc:creator ?c . }'
rdf <- rdf_parse(doc)
rdf_query(rdf, sparql)
jq
readr::read_file(ex) %>%
jqr::jq(
'."@reverse".author[] |
{ year: .dateCreated,
author: .author[] | [.givenName, .familyName] | join(" ")
}') %>%
jqr::combine() %>%
jsonlite::fromJSON()
Select the first 15 triplets.
library(SPARQL)
endpoint <- "http://linked.bodc.ac.uk/sparql/"
query <- 'select * where {?s ?p ?o . } limit 15'
out <- SPARQL(endpoint, query)
query <- 'select * where {?subject ?predicate ?object . } limit 15'
res <- SPARQL(endpoint, query)
res %>% class
[1] "list"
xml
$results
$namespaces
NULL
"http://www.w3.org/ns/dcat#Dataset"
[1] "http://www.w3.org/ns/dcat#Dataset"
query <- 'select ?dataset where {?dataset a <http://www.w3.org/ns/dcat#Dataset> . } limit 15'
query <- 'SELECT DISTINCT ?s
WHERE {
?s a <http://www.w3.org/ns/dcat#Dataset>
} limit 15'
SPARQL(endpoint, query)
$results
$namespaces
NULL
res$results %>% toJSON() %>% class()
[1] "json"
SPARQL::commonns
[1] "xsd"
[2] "<http://www.w3.org/2001/XMLSchema#>"
[3] "rdf"
[4] "<http://www.w3.org/1999/02/22-rdf-syntax-ns#>"
[5] "rdfs"
[6] "<http://www.w3.org/2000/01/rdf-schema#>"
[7] "owl"
[8] "<http://www.w3.org/2002/07/owl#>"
[9] "skos"
[10] "<http://www.w3.org/2004/02/skos/core#>"
[11] "dc"
[12] "<http://purl.org/dc/elements/1.1/>"
[13] "foaf"
[14] "<http://xmlns.com/foaf/0.1/>"
[15] "wgs84"
[16] "<http://www.w3.org/2003/01/geo/wgs84_pos#>"
[17] "qb"
[18] "<http://purl.org/linked-data/cube#>"
ex <- system.file("extdata/vita.json", package="rdflib")
vita <- rdf_parse(ex, "json")
sparql <-
'PREFIX schema: <http://schema.org/>
SELECT ?coi_given ?coi_family ?year
WHERE {
?paper a schema:ScholarlyArticle .
?paper schema:author ?authors .
?paper schema:dateCreated ?year .
?authors schema:familyName ?coi_family .
OPTIONAL { ?authors schema:givenName ?coi_given . }
FILTER ( ?coi_family != "Boettiger" )
}
'
vita %>% rdf_query(sparql)
I know this is more the redland
package and the whole point is I’d kinda like to know what world is. Despite this reference in the vignette: > user does not have to manage world, model and storage objects by default just to perform standard operations and conversions)
I’m not really sure what they are. Would be nice to explain what the structure of the S3 rdf is. Is there usefull metadata that can be extracted from the structure?
rdf_serialize
documentation: I think a brief mention of what serialisation actually is would really help understand this function. I also feel it should be flagged more prominently that the function writes a file out.pkgreviewr::pkgreview_print_source("rdflib")
## rdf
function ()
{
world <- new("World")
storage <- new("Storage", world, "hashes", name = "", options = "hash-type='memory'")
model <- new("Model", world = world, storage, options = "")
structure(list(world = world, model = model), class = "rdf")
}
<environment: namespace:rdflib>
---
## rdf_add
function (x, subject, predicate, object, subjectType = as.character(NA),
objectType = as.character(NA), datatype_uri = as.character(NA))
{
stmt <- new("Statement", world = x$world, subject, predicate,
object, subjectType, objectType, datatype_uri)
addStatement(x$model, stmt)
invisible(x)
}
<environment: namespace:rdflib>
---
## rdf_parse
function (doc, format = c("rdfxml", "nquads", "ntriples", "trig",
"turtle", "jsonld"), ...)
{
format <- match.arg(format)
doc <- text_or_url_to_doc(doc)
if (format == "jsonld") {
tmp <- tempfile()
tmp <- add_base_uri(doc, tmp)
rdf <- jsonld::jsonld_to_rdf(tmp)
writeLines(rdf, tmp)
format <- "nquads"
doc <- tmp
}
x <- rdf()
mimetype <- unname(rdf_mimetypes[format])
parser <- new("Parser", x$world, name = format, mimeType = mimetype)
redland::parseFileIntoModel(parser, x$world, doc, x$model)
x
}
<environment: namespace:rdflib>
---
## rdf_query
function (x, query, ...)
{
queryObj <- new("Query", x$world, query, ...)
queryResult <- redland::executeQuery(queryObj, x$model)
out <- list()
result <- redland::getNextResult(queryResult)
out <- c(out, result)
while (!is.null(result)) {
result <- redland::getNextResult(queryResult)
out <- c(out, result)
}
redland::freeQueryResults(queryResult)
redland::freeQuery(queryObj)
rectangularize_query_results(out)
}
<environment: namespace:rdflib>
---
## rdf_serialize
function (x, doc, format = c("rdfxml", "nquads", "ntriples",
"trig", "turtle", "jsonld"), namespace = NULL, prefix = NULL,
...)
{
format <- match.arg(format)
jsonld_output <- format == "jsonld"
if (jsonld_output) {
format <- "nquads"
}
mimetype <- rdf_mimetypes[format]
serializer <- new("Serializer", x$world, name = format, mimeType = mimetype)
if (!is.null(namespace)) {
redland::setNameSpace(serializer, x$world, namespace = namespace,
prefix = prefix)
}
status <- redland::serializeToFile(serializer, x$world, x$model,
doc)
if (jsonld_output) {
txt <- paste(readLines(doc), collapse = "\n")
if (length(txt) > 0) {
json <- jsonld::jsonld_from_rdf(txt)
writeLines(json, doc)
}
}
invisible(status)
}
<environment: namespace:rdflib>
---
$rdf
NULL
$rdf_add
NULL
$rdf_parse
NULL
$rdf_query
NULL
$rdf_serialize
NULL
redland::serializeToFile
nonstandardGenericFunction for "serializeToFile" defined from package "redland"
function (.Object, world, model, filePath, ...)
{
standardGeneric("serializeToFile")
}
<environment: 0x10677db20>
Methods may be defined for arguments: .Object, world, model, filePath
Use showMethods("serializeToFile") for currently available ones.
rdflib:::rdf_mimetypes
nquads ntriples rdfxml trig
"text/x-nquads" "application/n-triples" "application/rdf+xml" "application/x-trig"
turtle
"application/turtle"
rdflib:::rectangularize_query_results
function (out)
{
vars <- unique(names(out))
X <- lapply(vars, function(v) gsub("\"(([^\\^])+)\"\\^*.*",
"\\1", as.character(out[names(out) == v])))
names(X) <- vars
as.data.frame(X, stringsAsFactors = FALSE)
}
<environment: namespace:rdflib>
Is there a way to return a non regularised query result ie return an rdf
instead?
I’m thinking about a usecase when maybe it’s better to enrich data by merging rdf
s?
Ie, researcher queries a triples store through an API (yeyyy open data!), combines their not fully matching but interoperable rdf
data with rdf_add
(ie try to show how triplestore is better than tabular non-linked data for merging) and then queries the merged rdf to extract an enrched analytical tabular dataset?
covr::package_coverage(pkg_dir)
[1mrdflib Coverage: [22m[32m100.00%[39m
[1mR/rdf.R: [22m[32m100.00%[39m
Add tests for being able to serialise to trig
and turtles
which at the moment is throwing an error.
Perhaps a test for parsing/serialising each format could be good.
comments: