Sunday, March 31, 2013

Talking to machines

The last week I have remotely followed two events related to Evidence Based Medicine (EBM), both took place in Oxford:

+Ben Goldacre did speak at both events. At the Cochrane event he talked about getting better in talking to the public, policy makers and machines. In the last part of his talk: Talking to Machines he says "That it's odd how we share results of RCTs (Randomised, Controlled Trials) in C19th essay format!" This is also how Cochrane Collaboration share reviews and meta-analyses of clinical trial data.


Structured data in RDF

Instead we should use C21th structured data standards. I was especially pleased to hear how he was even more explicit: "Publish in RDF a good, quality standard, nice data format" [at 36.50 mins]

See also what the web development director at Cochrane, +Chris Mavergamessay in his excellent presentation on how linked data can help free content from the 'container of the article'.


This is related to our the work we do on linked clinical data standards, see my recent blog post: CDISC2RDF. That is, a semantic web versions of  data standards for clinical data on subject/participant level.

Clinical Data Transparency 

Given the recent move towards clinical data transparency (see a good summary in Nature this week Drug-company data vaults to be opened) I foresee a discussion also on data standards for the summary level data in clinical study reports and per-reviewed papers using semantic web.

An alternative could be to represent tables in the reports and paper as RDF using the RDF Data Cube Vocabulary (for multi-dimensional statistical data), see the CSVImport and the CubViz projects (Representing and browsing multi-dimensional statistical data as RDF using the RDF Data Cube Vocabulary, previously called Stats2RDF) This EU/FP7 project has used this vocabulary to publish biomedical statistical data, e.g. the WHO's Global Heath Observatory dataset (see Publishing and Interlinking the Global Health Observatory Dataset).

A challange is to express the clinical trial design and other contextual information as structured data to make it possible to do cross trial reviews and analyses.

Tuesday, February 12, 2013

CDISC2RDF

In a recent article from semanticweb.com (The Voice of Semantic Web Technology and Linked Data Business) the project CDISC2RDF is nicely decribed: Clinical Studies And The Road To Linked Data.

The project will be presented at the Conference on Semantics in Health Care & Life Sciences (CSHALS) meeting at the end of February by Charlie Mead, co-chair of the W3C’s Health Care and Life Sciences Interest Group (HCLSIG).

Here is a slide deck describing the first deliverable of the project. A refined slide deck will be presented at the CSHALS meeting together with a couple of CDISC2RDF blog post to describe the transformation process.

Saturday, December 29, 2012

My MOOCs Spring 2013

Great to see that the news program on SVT (Swedish Television) described MOOC (Massive Open Online Courses) in a new story the other day.


During 2012 I have followed a few courses via one of the organisations mentioned in the news program: Coursera.Two of the courses were excellent: Model Thinking and  Fundamentals of Pharmacology, and they are on Coursera's list of 211 (!) courses. While the course in "Software Engineering for Software as a Service (SAAS") was not of the same high quality, and it's not on the list anymore.

For the Spring 2013 I have enrollod three MOOCs. So, now I know what to do while commuting 2 hours per day also the coming months :-)


It's great to see how all of this have taken off during 2012 offering courses not only for data nerds as myself but also for many others.

So, I was thinking of my sister when I read these teasers from Coursera:
  • "Ever wonder why people do what they do? This course offers some answers based on the latest research from Social Psychology."
  • "In the course Introductory Human Physiology students learn to recognize and to apply the basic concepts that govern integrated body function (as an intact organism) in the body's nine organ systems."

And, I was thinking of my husband when I saw these nice videos from Coursera:

Sunday, September 16, 2012

Mind maps just begging for RDF triples and formal models

Earlier this week CDISC English Speaking User Group (ESUG) Committee arranged a webinar: "CDISC SHARE - How SHARE is developing as a project/standard” with Simon Bishop, Standards and Operations Director, GSK. I did find the comprehensive presentation from Simon, and his colleuage Diane Wold, very interesting.

Interesting as the presentation in an excellent way exemplifies how "Current standards (company standards, SDTM standards, other standards) do not current deliver the capability we require" Also, I do find the presentation interesting as it exemplifies mind maps as a way forward as "Diagrams help us understand clinical processes and how this translates into datasets and variables." (Quotes from slide 20 in the presentation: Conclusions.) 

Below a couple of examples of mind maps from the presentation. And also, the background to my thinking that they are Mind maps just begging for RDF triples and formal models of the clinical and biomedical reality to make them fully ready "both for human understanding and for computer interpretation".


High level mind map from
the Parkinson's disease exampel (slide 14)

Current standards do not current deliver the capability we require 

This conclusion is backed up in the first half of the presentation with exemples from GSK's internal standards and from CDISC's SDTM standards. These are low level data standard specifying data structures and data elements (variables). Standards for exchange of data in bulk (in containers such as SDTM Vital Sign and Lab domains) or standards for exchange of captured data (in specified variables such as a data modules for specific blod pressure and temperature mesurements) . Good exemples in the presentations show the challanges in analysing and aggregating clinical data put into SDTM dataset variables as containers "lacking documented relationships between the variables".

Example from data represented in the proposed
SDTM standard for Parkinson's disease (slide 12-13)


Diagrams help us understand clinical processes and how this translates into datasets and variables

The value in drawing diagrams to understand the higher level of relationsships in terms of the clinical processes in which clinical data is captured for different diseases. This is nicely illustrated in the presentation with a couple of diagrams, or "mind maps".

Example of a map of the clinical process
 for the Parkinson's disease example above (slide 15) 
And also on the value in drawing diagrams to understand the mid level of relationsships in terms of "concepts" *) and "concept variables" and how these should be put into the SDTM variables (in red). (The example below is unfortunately not the same as the above Parkinson's disease examples.)

Example of a map for the concept of
Temperature measuerement  (slide 28)

Mind maps just begging for RDF triples and formal models

When I see these mind maps I see graphs just begging for RDF triples (subject, predicat, object). That is, the fundemental semantic web standard. See my two earlier blog posts from two presentations at CDISC Interchange Europe: Semantic models for CDISC standards and metadata and Linking Clinical Data Standards

An intersting exercise would be to have the Parkinson's disease exemple completed in the concept mapping tool (CMAP) the whole way down to SDTM. And export the mind maps using as RDF triples. However, this is nice, but not enough ...

When I these mind maps I can also see how easy it is to start drawing such diagrams and exporting them as representations of generic mind maps. However, to fullfill the ultimate goal to have them "captured in a way that these can be used both for human understanding and for computer interpretation" the "mind maps" need underlying formal models of the clinical and biomedical reality.
 
Therefore, I see an interesting connection between the high level maps for disease and clinical processes to the Ontology for General Medical Science (OGMS). OGMS is an ontology of entities involved in a clinical encounter and provides a formal theory of disease that have been further elaborated by specific disease ontologies. See my blog post from last year on Disease terminologies and ontologies.


*) The CDISC SHARE project talks about scientific, or research, concepts. It has also been called observation concepts. However, the word "concept" is overused and carries challanges in itself, see From concept to clinical reality. 

Kudos to Frederik Malfait, working for Roche and my co-presenter on Semantic models for CDISC data standards and metadata, for pointing me to this presentation.

Wednesday, June 20, 2012

To Whom It May Concern

A nice tweet from Phil Archer (@philarcher1) this morning reminded me of a "triple tweet" I posted earlier this year on the topic of creating data and metadata To Whom It May Concern ('a formal salutation used for opening a letter to an unknown recipient' source Wikipedia)

So, here's Phil's tweet quoting Sharon Dawes (@ssdawes):


And here is my "triple tweet" on a what I see as one of the core values of Linked Data:




I posted them after I had the pleasure to meet David Wood (@prototypo) and Berndette Hyland (@BernHyland) F2F in a Linked Data and URI workshop in Boston in late January:

Sunday, May 27, 2012

AstraZeneca re-joins W3C HCLS

After a warm and sunny day of kayaking out in the archipelago north of Gothenburg it was nice to catch up on Twitter and see the official announcement from W3C that my employeer; AstraZeneca, has joined W3C. It's actually a re-join as we joined W3C in 2006 to participate in the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG). 


Update 6 June: I recommend this nice slide deck for an overview of Semantic Web and Related Work at W3C, presented by Ivan Herman (@ivan_herman) at the 2012 Semantic Tech & Business Conference in San Francisco, CA, USA,   5 June.

I attended and reported back from the W3C conference in Edinburgh in May 2006 (WWW2006) and from the next one in Banff in May 2007 (WWW2007) together with my former colleague Bosse Andersson (@bbalsa). My focus was on applying semantic web standards for clinical data and in 2007 Eric Neumann (@ericneumann), one of the HCLS pioneers, and I published a W3C Note in the Drug Safety and Efficacy task force on CDISC's Study Data Tabulation Model (SDTM). And together with most of the members in the HCLS group I co-authored an important article in BMC Bioinformatics Advancing translational research with the Semantic Web.  

In late 2007 I had to focus on other tasks while Bosse and colleuges in the US; Julia Kozlovsky, Elgar Pichler and Otto Ritter contiued the interactions with other parties across life science and health care in two of the HCLS groups: Linking Open Drug Data (LODD) and Translational Medicine Ontology (TMO).  

In early 2010 when I returned my job focus to semantic interoperability, AstraZeneca had decided not to renew the W3C membership. To stay updated I started to use use social media as a way to engage with the semantic web and linked data community, to follow thought leaders in the intersection between eHealth and Clinical Research, and to share news and insights with colleagues.

Early 2011 Bosse and I wrote a short paper to summarise insights from AstraZeneca's engagment in W3C HCLS and in the EU project Large Knowledge Collider  (LarKC). The paper, Linked Data, an opportunity to mitigate complexity in  pharmaceutical research and development, starts with a look back to one of the most inspiring meetings I have been in:
During the WWW2007 conference a breakthrough of the Linked  Data idea happened in a session where web experts demonstrated the power of a new generation of the web, a web of data. For us attending the session it was hard to imagine the full potential on what this idea would mean for individual scientists and for a  pharmaceutical company. 
As described in my earlier blog post we do now have a new program in AstraZeneca called Integrative Informatics (i2) establishing the components required to let a linked data cloud grow across R&D. Re-joining W3C and re-connect with HCLS is one step in this.




Sunday, May 6, 2012

Semantic models for CDISC based standard and metadata management

In mid April we did a presentation at the 2012 CDISC (Clinical Data Interchange Standards Consortium) Interchange Europe with the title: Semantic models for CDISC based standard and metadata management (see our slides and short paper). This time in a sunny, but chilly, Stockholm at a very nice hotel (Elite Marina Tower). Last year Frederik Malfait,  consulting at Roche, and I, working for AstraZeneca, had two different presentations at the 2011 conference in Brusses. See my blog post: Linking Clinical Data Standards

Since then we have seen more interest in semantic web standards in the CDISC community, see for example the article in Applied Clinical Trials Online (@Clin_Trials): Digital Data, the Semantic Web, and Research, by  Wayne Kubick, the new CTO of CDISC. This year Frederik and I did a joint presentation with a key messsage to the CDISC organisation: "Put semantics into the semantics". That is, to start using semantic web standards and linked data principles for the whole suite of CDISC standards. See below our list of proposals.

In my introduction I described the current situation when the question now is "Not when, but how" to best adopt CDISC standards. At the same time the different CDISC standards are not linked and published in different formats and so called metadata registeres (MDR) are requested for robust life cycle management of standards. 

Real world use 

In my brief introduction (see slide 5-11) to the core semantic web standard, the so called RDF triple, I showed an example of how Google use RDF based standards to improve search (see my previous blog post on schema.org). And I also showed how NCI use RDF to publish the NCI Thesaurus, see RDF/OWL download of NCIt via LexEVS. And also how RDF is used for an early version of  the domain model for biomedical research (BRIDG), see RDF/OWL representation of BRIDG/ISO21090. In both these cases the RDF is published as XML, but RDF triples can also be published in different serialisation formats (i.e. XML, JSON, Turtle, and N-Triples). I also showed the latest version of the Linked Open Data cloud, with even more linked datasets than the one Frederik and I had in our presentations last year. I then turned over to the main part of our presentation describing two real world use of how two sponsors now start to use semantic web standards and linked data principles.

Linked Data cloud to grow across AstraZeneca R&D

Photo from CDISC Facebook
In AstraZeneca we have a new program called Integrative Informatics (i2) establishing the components required to let a linked data cloud grow across R&D. A key component is the URI policy for how to make for example a Clinical Study linkable by giving it a URI, that is a Uniform Resource Identifier, e.g. http://research.data.astrazeneca.com/id/clinicalstudy/D5890C00003. This is an identifier for a clinical study with the study code D5890C00003 that should be persistent and not dependent on any system. In the same way we will give guidance on how to use URI:s to make other key entities such as Investigator and Lab linkable. Also standard data elements from CDISC and internal ones to be managed in a future MDR should have URI:s to make them linkable. For more information on how URI:s are being used in for example the UK and US governments, see my URI design page.


A semantic web standard based MDR in Roche

Frederik described the schema, content and architecture of Roche Biomedical MDR. And then he went through a demo using a RDF representation of a CDISC standard example and of an internal Roche standard (you will find the screenshoots from the demo in end of the slide deck). He first showed how the standards could be viewed using a general tool (TopBraid Composer from TopQuadrant, but could be any other RDF tool such as Protégé, a common open source tool). On slide 20-28 you can see how SDTM model v.1.2, SDTM IG v3.1.2, and SDTM CT:s, all are linked together (for example Observation Class: Event - Domain: AE - Variable:  AEOUT - Submission value: NOT RECOVERED/NOT RESOLVED). And then he showed the same RDF representation via the application Roche Global Standard Data Browser (slide 29-37). Frederik also showed how the linked data standards can be exported in SAS and Excel formats (slide 42-50). And finally, he showed an example from a Roche standard questionnaire.

Proposals to CDISC

In the slides you can see that Frederik had to transform CDISC standards into RDF using a schema he developed for Roche and give them URI:s in a Roche namespace (e.g. http://gdsr.roche.com/cdisc/sdtmig-3-1-2#Column.AE.AEOUT for one of the data elements). This is not a ideal way, instead we would like CDISC to provide these. Hence the drive from our leadership in Roche and AstraZeneca for Frederik and myself to push back to CDISC. 

Below a draft list of proposals to CDISC: 
  • Decide on a URI design for CDISC standards (e.g. http://id.cdisc.org/sdtm).
  • Review the schema Frederik has proposed for the core MDR in CDISC SHARE. 
  • Publish the new SDTM v1.3 and SDTM IG v.3.1.3 as RDF in XML, JSON, Turtle, and N-Triples formats using the reviewed schema and URI design. (As options to current publication formats, i.e PDF, html, csv, xml/odm.) 
  • Work together with NCI on enhancing the RDF/OWL version of NCI Thesaurus. Also review the option to use the RDF/SKOS standard and apply linked data principles. Publish coming versions of CDISC CT:s as RDF in XML, JSON, Turtle, and N-Triples. 
  • Work together with NCI on enhancing the RDF/OWL representation of BRIDG/ISO21090 model and apply linked data principles to make all BRIDG classes, properties and ISO21090 data types linkable.
  • Extend the MDR schema for CDISC SHARE for linkage to relevant BRIDG classes and properties and to ISO21090 data types.
  • Start exploring semantic web standards and linked data principles also for clinical data, including making invidual clinical data points linkable using URI:s and annotating them using existing and emerging clinical standard terminilogies and ontologies.