Draft for Public Comment

9. Data mining

1 As a human or machine user, I want to be able to mine text or data in the collective content of repositories to discover new relationships and make new discoveries. Repositories should enable the data mining of their content by allowing third parties to effectively access and transfer full text objects. The data mining activity is carried out outside of the repository network in a data miner’s infrastructure of choice. Such third parties include aggregators, which need to be able to effectively and in a timely manner transfer the available data from repositories into another location. Furthermore, they need to be able to effectively and in a timely manner synchronise their own dataset with the repositories from which they are harvesting full text, and they must be able to track all changes to the underlying datasets, such as updates, deletions and additions. To provide additional support for the reproducibility of data mining experiments, repositories should expose a log of all changes to the underlying data. The data offered via the repository machine interfaces must include the metadata and full text content, as well as a log of all changes. In addition, repositories should enable the sharing of user interaction data, such as clicks (article level CTR), co-downloads or comments, to enable the development, deployment and evaluation of innovative value-added global services over repositories. Leave a comment on line 1 0

Page 15

Source: http://comment.coar-repositories.org/7-9-data-mining/