OER Tech

Paradata - activity data for learning resources

Paradata is a means of recording and openly exchanging data about how, and in what context, learning resources are used.

Why paradata is important

Over the last decade the volume of open educational resources on the web has grown exponentially, boosted by the proliferation of OER initiatives, including the UK OER Programme. While search engines such as Google have made it easier to discover all kinds of content, it has remained difficult to identify the context of educational resources. Whether for teachers, learners or content providers, when it comes to discovering and using educational resources, context is key. Search engines may allow users to discover educational resources but they will say little about how those resources have been used, by whom, in what context and with which outcome.

Formal educational metadata standards have gone some way to addressing this problem, but it has proved to be extremely difficult to capture the educational characteristics of resources and the nuances of educational context within the constraints of a formal metadata standard. Despite the not inconsiderable effort that has gone into the development of formal metadata standards, data models, bindings, application profiles and crosswalks, the ability to quickly and easily find educational resources that match a specific educational context, competency level or pedagogic style has remained ellusive.

A new approach to learning resouce discovery was developed in 2010 by two US initiatives, the US National Science Digital Library (NSDL)1 and the Learning Registry2 which in addition to recording first party metadata also focused on sharing second-party usage data referred to as paradata. The term paradata was first used by the NSDL in early 2010 to describe data about user interactions with learning resources within the NSDL’s STEM Exchange3.  Later that year the paradata approach was adopted by the Learning Registry, an initiative funded by the U.S. Department of Education and the U.S. Department of Defense. The Learning Registry is an open source decentralized content-distribution network of peer-to-peer nodes that can store and forward information about learning resources. The primary purpose of the Learning Registry is to share descriptive metadata and social usage paradata across diverse educational systems.

Paradata is essentially a stream of activity data about a learning resource that effectively provides a dynamic timeline of how and in what context that resource has been used. Paradata is generated as learning resources are used, reused, adapted, contextualized, favorited, tweeted, retweeted, shared. Some of this data is deliberately created by users e.g. likes, comments tags; while some is generated incidentally as a result of the resources' use, e.g. hits, download statistics, links to other resources. As more usage data is collaboratively gathered and published the paradata timeline grows and evolves, amplifying the available knowledge about what educational resources are effective in which learning contexts. Paradata complements existing metadata by providing an additional layer of contextual information. By capturing the user activity related to the resource, paradata can help to elucidate its potential educational utility. The Learning Registry team refer to this approach as “social networking for metadata”1.

Paradata about a learning resource visualised as a stream of data about the activities in which the resuorce has been used, similar to the timeline feature in social networking sites such as facebook.

On the simplest level paradata can be used to record how users interact with a resource by viewing, downloading, sharing, liking, commenting, tagging, etc.  Paradata can include information about users of a resource; e.g. age, educational level, geographical location, etc.  It can also record contextual information by linking resources with educational standards and curricula, pedagogic approaches and methodologies.  In addition paradata has the ability to record complex aggregations of activities, e.g. "between January 2011 and January 2012 lecturers in Engineering, Physics and Maths, used this resource, 6 times for undergraduate teaching activities".

The Learning Registry infrastructure is built on Apache CouchDB5, a noSQL style document oriented database providing a RESTful JSON API. The initial Learning Registry development implementation, or node, is available as an Amazon Machine Instance. This enables anyone to set up their own node on the Amazon cloud quickly and easily.  However as CouchDb is a cross-platform application, nodes can be run on most systems (e.g. Windows, Mac, Linux). In addition a key feature of the Learning Registry is that it is metadata agnostic; in addition to diverse paradata, it will accept legacy metadata in any format and will not attempt to harmonise the metadata it consumes.  These approaches represent a potentially interesting solution of the "messy" problem of aggregating usage data from the tens of thousands of open educational resources produced by the UK OER Programmes.  In this context a "mess" implies a complex issue that is not well formulated or defined while a "problem" is a well formulated/ defined issue but with no single solution6.

Programme approaches to paradata

Since its inception, the Learning Registry development has been of considerable interest to JISC due to the innovative technical approach it adopted to facilitating resource discovery.
 
JISC initially comissioned CETIS to undertake a watching brief on the Learning Registry as the project was being scoped and specifications developed.  Experiences from the JISC content creation programmes and the technical approaches adopted by the OER Pilot Programme were fed into the scoping phase.  The Learning Registry team also engaged closely with the JISC, CETIS and the UK technical development community by participating in hackdays, contributing to several CETIS events, and attending a number of JISC strategic planning meetings. This ongoing communication fostered an appetite among the UK OER community for engaging with emerging innovative approaches and several of the more mature technically oriented OER projects took an interest.

JLeRN

In 2011 around the same time that JISC launched the OER Rapid Innovation programme, technical intervention funding was allocated to a small team at Mimas7 to develop an experimental Learning Registry test node, the first to be developed outwith the US, this became known as the JLeRN Experiment8.  

The JLeRN Experiment was a proof of concept project run by Mimas with support from JISC CETIS to explore the practicalities of configuring and running a Learning Registry node and to explore the practicalities of getting data in and out of the network. The project also brought together UK technical developers who were interested in working with the Learning Registry and the JLeRN test node.

A number of projects funded by a range of JISC programmes have engaged with JLeRN. developements on various levels.

ENGrich 

ENGrich9 at the University of Liverpool  is leveraging the Learning Registry to design and develop a customized search engine for visual media relevant to engineering education. Using Google Custom Search10 (with applied filters such as tags, file types and sites/domains) as a primary search engine for images, videos, presentations and Flash movies, the project will pull and push corresponding metadata and paradata to and from the Learning Registry. A user interface is also being developed to enable end users (students and academics) to contribute further data relating to particular resources and their usage. This information is also published to the Learning Registry. The Learning Registry data is then used to help order any subsequent search. Thus, the Learning Registry plays a central role in "engriching" the visual engineering content beyond the basic results provided by Google search.11 

Jorum Paradata Enhancement Project

Jorum12 is a national JISC funded DSpace repository for sharing open learning resources and is described more fully in the Resource Management chapter. Jorum is run by Mimas and the Paradata Enhancement Project is being undertaken by Cottage Labs.  The aim of the project is to enhance the exposure of usage statistics from the Jorum Dashboard13, a PHP application which provides a view on the current status of the paradata for the Jorum OER repository, giving users, developers and managers access to this information in new and useful ways. 

Sharing Paradata Across Widget Stores 

SPAWS14 is a collaborative OER Rapid Innovation project involving the University of Bolton, the Open University, KU Leuven, and IMC, which aims to share usage data, such as reviews, ratings, and download statistics, between educational widget stores. SPAWS is building on the Learning Registry and Activity Streams15 to connect together several app stores that share web widgets and gadgets for educators. Each time a user visits a store and reviews, rates or embeds a particular widget or gadget, that information will be syndicated to other stores in the network.11 The project's lessons learnt post comments that the technology works for this use case and that there is an appetite for developing this approach.

Rapid Innovation Dynamic Learning Maps-Learning Registry (RIDLR)

RIDLR16 is another OER Rapid innovation Project based at the University of Newcastle that builds on two previous OER projects, Dynamic Learning Maps17, and FavOERites18  social bookmarking project, to develop open APIs to harvest and release paradata on OERs from end users, including bookmarks, tags, comments, ratings and reviews etc., from the Learning Registry and other sources, for specific topics within the context of curriculum and personal learning maps.11

Issues

In articulating the lessons learnt about paradata it is useful to distinguish the issues relating to the Learning Registry architecture from those relating to paradata itself.

Emerging architectures

Regardless of whether or not a network of Learning Registry nodes proliferates across the UK Higher and Further Education sectors, it seems likely that the approach taken to their technical architecture (using noSQL document oriented databases, cloud hosting, and RESTful JSON APIs) is indicative of innovative technical developments in the area of large scale data management. For example, the University of Lincoln recently demonstrated the use of another massively scaleable no SQL database, MongoDB19, for handling large volumes of research data. The early barrier to overcome is the need for skills, particularly in noSQL databases, to be able to handle the messy data inherent to the architecture.

The value of paradata

Although both paradata, and the technical approaches for sharing paradata developed by the Learning Registry, have aroused considerable interest in the UK F/HE community, these are still relatively experimental and immature technologies and it is debatable how much impact they will have in the immediate future.  While many systems used for managing and sharing OER generate large volumes of paradata in the form of usage statistics, little of this data is currently being surfaced in such a way that it can be analysed. In addition, work undertaken by the OER Data Analysis and Visualisation Project on Jorum resource records revealed only minimal social interactions, in the form of sharing, liking retweeting, etc, around individual resources20. That said, there is growing anecdotal evidence to suggest that more social sharing occurs around curated collections of resources. For example a single mention on Stumbleupon of a set of resources released by the University of Oxford on the topic of stress and depression resulted in 20,000 hits on one video in a seven day period21.  This activity was only revealed by a spike in the project's Google Analytics. In another instance a single page of curated Film Studies resources developed as a personal project by a lecturer at the University of Sussex generated almost 50 Facebook reactions, over 80 tweets22. Further work is required to understand more about how, why and under what circumstances social activity occurs around different types and aggregations of learning resources.

A stable curriculum enables stronger patterns to form out of the data, so it lends itself to a more structured educational content space. It is notable that the Learning Registry developed in parallel with a focus on the K12 curriculum in the US. Though there has been significant interest in the development of the Learning Registry through JISC, it remains to be seen whether an initiative which is primairly focused on surfacing resources for the US schools sector will have a significant impact on UK Higher and Further education.

Future directions

Taking a network level approach to reuniting content with its context is a new solution to the problem of "educational metadata" as described in the chapter on Resource Description. It does not seem too far fetched to say that the Learning Registry's technical strategy and their approach to attempting to solve the messy problem of aggregating and surfacing distributed heterogenous metadata and paradata is highly likely to influence future technical directions and innovations in resource management and discovery.

 References

  1. National Science Digital Library, http://nsdl.org/ 
  2. The Learning Registry, http://www.learningregistry.org/ 
  3.  NSDL's Technical Schema for Paradata Exchange, http://nsdlnetwork.org/stemexchange/paradata/schema 
  4. Rehak, D., (2011), The Learning Registry: Social Networking for Metadata, http://blogs.cetis.ac.uk/othervoices/2011/03/22/thelearningregistry/
  5. Apache Couch DB, http://couchdb.apache.org/ 
  6. Robertson, R.J., Mahey, M., and Allinson, J., (2008), An ecological approach to repository and service interactions, http://repository.jisc.ac.uk/272/1/Introductoryecologyreport.pdf
  7. Mimas, http://mimas.ac.uk/ 
  8. The JLeRN Experiment, http://jlernexperiment.wordpress.com/ 
  9. ENGrich, http://engrich.liv.ac.uk/ 
  10. Google Custom Search Engine, http://www.google.com/cse/ 
  11. Lee, A., Hobson, J., Bienkowski, M., Midgley, S., Currier, S., Campbell L., Novoselova, T., (2012), Towards Networked Knowledge: The Learning registry, an infrastructure for online learning resources, in Educational Technology Magazine. 
  12. Jorum, www.jorum.ac.uk 
  13. Jorum Dashboard Beta, http://dashboard.jorum.ac.uk/ 
  14. Wilson, S., (2012), SPAWS in a Nutshell, http://scottbw.wordpress.com/2012/04/16/spaws-in-a-nutshell/ 
  15. Activity Streams, http://activitystrea.ms/ 
  16. Hardy, S., (2012), RIDLR in a Nutshell, http://www.medicine.heacademy.ac.uk/blog/oer-phase-3-blog/2012/jun/15/nutshell-post-for-ridlr/ 
  17. Dynamic Learning Maps, https://learning-maps.ncl.ac.uk/ 
  18. FavOERites, http://oerbookmarking.ncl.ac.uk/ 
  19. MongoDB, http://www.mongodb.org/ 
  20. Hawksey, M., (2012), OER Visualisation Project: How is OER being socially shared – postscript [day 30], http://mashe.hawksey.info/2012/01/oer-visualisation-project-how-is-oer-being-socially-sharedpostscript-day-30-ukoer/ 
  21. Robinson, P., (2012), Re: How is OER being shared (and promoted),  https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=OER-DISCUSS;9bcbab40.1201
  22. Film Studies for Free,  http://filmstudiesforfree.blogspot.co.uk/p/online-film-and-moving-studies-phd.html 

There has been error in communication with booki server. Not sure right now where is the problem.

You should refresh this page.