Draft for Public Comment

Comments by Commenter

  • Ben Companjen

    • “Supporting researchers’ workflows” is a very good objective, but at the same time pretty hard, as different researchers could have very different workflows – especially keeping in mind that these workflows are not for written publications exclusively. I can see and agree that this would translate into “deposit […] with as little effort as possible”, though this requires effort from repository developers to support standard protocols and developers and administrators of *lots of* “other tools” to use the standard protocols and allow the integration with (m)any repositories.

      This user story should not use “paper” in the first sentence, but a more generic term. “Microsoft Word” and “Google Docs” should be properly capitalised.

      Having a submission form could go against the objective of “as little effort as possible”. A full integration with a different platform, I think, would eliminate the step of actively submitting to a repository.

  • coar

    • Comment on General Comments on February 23rd, 2017

      Hi Katherine, I have made a pdf of the full text document. You can download it here: https://www.coar-repositories.org/files/COAR-Next-Generation-Repositories-February-7-2017.pdf

    • Comment on What have we missed? on February 14th, 2017

      One new user story from the Next Generation Repositories Working Group – Batch-based content discovery

      A user wants to discover repository materials via aggregators such as Google Scholar, CORE, … And in order to make that happen, repositories need to implement widely supported batch content discovery mechanisms such as Sitemaps. And, if in addition metadata about content needs to be discovered, ResourceSync (based on Sitemaps) can be used because it allows linking to related resources, including metadata describing resources.

  • Dominique Babini - CLACSO

    • Comment on Principles and Design Assumptions on February 15th, 2017

      I suggest considering a Principle of sustainability (based on collaboration and sharing resources?)

    • Comment on Introduction on February 15th, 2017

      “The nearly ubiquitous deployment of repository systems in higher education and research institutions provides the foundation for a distributed, globally networked infrastructure for scholarly communication” It may be the adequate moment for COAR and the WG to think if the global networked infrastructure of repositories for scholarly communications will be limited to repositories in higher education and research institutions, or will work together with other repositories managed by the scholarly community but not located in higher education and research institutions: research data repositories, subject repositories, journal repositories. Working together could benefit institutional repositories with its own institutions´outputs located in those other repositories, and would contribute to one of the aims of this project: encourage the creation of cross-repository added-value services and cross-repository search-engine.

    • Comment on Introduction on February 15th, 2017

      Last sentence “In this context, repositories will provide access to published articles as well as a broad range of artifacts beyond traditional publications such as datasets, pre-prints, working papers, images, software, and so on”. I suggest rephrasing this sentence in the Introduction to value the present contribution of repositories for next generation repositories, it is not a promise, it is a reality today, and it would help raise funds to develop the creation of cross-repository added-value services and cross-repository search-engine mentioned in this document. Repositories already provide access to a vast and rich array of contents. Looking at main content types in OpenDOAR repositories, when published articles are mentioned in this sentence of the Introduction, it may be relevant to also mention other main published contents present today in repositories: theses and dissertations, books and book chapters/sections, conference papers and proceedings. When mentioning examples of artifacts beyond traditional publications, it may be relevant to also mention other contents today present in repositories: unpublished conference and workshop papers and presentations, unpublished reports and working papers, multimedia and audio-visual materials, learning objects.

    • Comment on What have we missed? on February 15th, 2017

      Congratulations and thank you to COAR for this much needed project, and to the WG for their work. Great vision, objectives and principles in support of a future of scholarly communications managed by the scholarly community !!!
      Great idea to write the User Stories in first person. And to make it short.
      I suggest considering the possibility of having a User Story for policy/funding/evaluation users. Addressed to the needs of the research policy/funding/evaluation users -their need for indicators to complement traditional metrics, their need of consolidated data for their research funding reports, among other needs- and also for the scholarly communications policy/funding/evaluation users who need data in support of their decisions concerning repositories and scholarly communications in general.

  • Jane Burpee

    • Comment on General Comments on March 1st, 2017

      Two other ‘stories’ that could be incorporated :
      As a funder
      As an institution

      Both groups may want to generate reports about their communities.
      Can the funder create a report about deposited works generated from their grants?
      Can the institution create a report about their affiliated researchers?

  • Joanna Richardson

    • Comment on What have we missed? on February 28th, 2017

      As an author, when I move from one institution to another, I want to have all my current repository records easily migrated to the next institutional repository. While there are available tools to collect the “obvious” metadata from major sources, I may have records associated with my profile which are unique to me in my previous employment.

  • Katherine Johnson

    • Comment on General Comments on February 8th, 2017

      Would it be possible to have a compiled PDF to download? It would be helpful to be able to read through the document and make notes on the side before responding.

  • Leah Vanderjagt

    • I think that this should be more about standards than the functions or services you list. I think of this as: As a repository manager, I want to be able to pull data from other systems into the repository and send it out to other systems according to a standard I can rely upon for highest accuracy. CASRAI implementation is an example of a standards-based recommendation around workflows.
      Also, most institutional repositories mediate deposit on behalf of researchers for very, very good reasons and so thinking more about the systems integration needs of the repository administrators overseeing that work is something that ought to be considered.

    • Do you see any potential for an editor role inside this system? Groups of scholars may organize within an interoperable repository framework and see value to such a role/want to establish it. I’m not sure open peer review in repositories needs to mean that there is no editor.

    • Comment on 4. Recognizing the user on February 11th, 2017

      I agree that this section is addressing two separate activities/problems of repositories.

    • Comment on 3. Discovering usage rights on February 11th, 2017

      Consider including the Rightsstatements.org initiative as a flag for in-copyright materials.

  • Library Repository Services, UNSW Library

    • Comment on 12. Preservation on March 14th, 2017

      The story is framed as ‘as a scholar I want my outputs preserved’ but institutions may also have preservation-related requirements.

    • Comment on 11. Comparing usage on March 14th, 2017

      This story seems a little confused about how to measure or register usage, referring initially to ‘downloaded and read’ and then later to ‘read and cited’.

      Researchers may want more granular or detailed usage statistics than just a raw number of downloads e.g. broken down by country/region, research area of end user.

    • Reducing depositor effort by auto filling metadata is a good idea, but the ‘title, date and so on’ that can be extracted from a work may not be sufficient metadata to satisfy many of the other use cases (as well as local requirements).

      Could additional metadata (e.g. grant/funding info) be similarly retrieved and populated from other systems? If there is more than one source of some type of metadata which should be preferred? Could the model connecting many individual repositories through a number of hubs outlined in (8) also aggregate data from third-party services as part of the overall metadata for a work?

    • Comment on 9. Data mining on March 14th, 2017

      The user story refers ‘text or data’ but the body to ‘full text content’ – is non-textual content e.g. images in scope here?

      As with (6), there are privacy implications for making user interaction data available.

    • This story describes with (7) a ‘hub and spoke’ model of many repositories connected to a smaller number of ‘interaction hubs’ that aggregate and disseminate ‘event’ and other data. Such a distributed system should be architected to avoid single points of failure and tolerate faults such as periods of unavailability for individual hubs/nodes (or what if they disappear permanently?), and to reliably handle a significant body of data – potentially every create, update, comment, citation etc. event for resources across many individual repositories.

      Similar to (7), there is the potential for a flood of resource and event data to be consumed by individual repositories/institutions. How much ‘incoming’ data would a reasonable size research institution be receiving on a daily/hourly/regular basis for all of its researchers and outputs? Should they be able to reduce/filter it somehow?

    • There is the potential for this to be a ‘noisy’ feed aggregating many events (new resources, updates to existing resources, comments etc.) together, unless there is functionality for users to search/filter their feed, customise what types of event they want to receive and so on.

      What is the difference (if any) between an ‘interaction hub’ and a ‘communication platform’?

    • There are privacy implications for making available even anonymised user activity logs.

      How would ‘download or view event’ activity be different to usage statistics/metrics as addressed in (11)?

      ‘Recommender systems are believed to be …’ – by whom?

    • How is quantitative data informing the ‘validity and perceived importance’ of a work collected, generated etc.? ‘Perceived’ implies an element of subjectivity – whose criteria or judgement is involved here?

      How should ‘out of band’ communication on other public platforms be handled – does it need to be captured alongside the comments made against resources within a repository? Is this covered in (7) by ‘content from related activity within [common social media channels] should be curatable’?

    • Comment on 4. Recognizing the user on March 14th, 2017

      In an open access repository comment should be open to all readers, so commenting functionality available to users should be the same regardless of whether they are recognised by the system using a ‘scholarly’ identifier or a social network/some other identity provider.

      Should users be able to restrict or control what user-contributed content can be attached to one of their works in a repository, e.g. disable open commenting?

      The body of the story refers to ‘reviewer’ as a distinct user role – should this be restricted to only users recognised using a ‘scholarly’ identifier? If a reviewer can contribute a review to any work in a repository even without being invited, how should ‘problematic’ reviewers be handled?

    • Comment on 3. Discovering usage rights on March 14th, 2017

      For uncommon/custom licences, a ‘plain English’ version is desirable in addition to the legal text.

      For a crawler to ‘act according to the constraints imposed by the license’ suggests licences need to be made available in a machine-readable format using standard/s that software agents can (implement behaviour to) interpret. Relevant initiatives include Creative Commons Rights Expression Language for describing what a licence permits, prohibits etc. and rightsstatements.org for indicating the in- or out-of-copyright status of a work.

      The statement that ‘ideally’ scholarly resources are made available without restrictions on use seemingly overlooks the potential value of mediated or restricted access to resources for which open access and reuse is not possible e.g. sensitive data.

    • As suggested by the phrasing ‘typed HTTP links’, the Signposting the Scholarly Web (signposting.org) proposal seems to address these stories and their requirements.

    • Comment on Introduction on March 14th, 2017

      General comments
      This is a welcome and well-considered vision of next generation repositories.
      User stories and their components/requirements may need a level of prioritisation.
      A number of stories are linked or overlap and some of their components may be dependent on components in other stories.
      How are these stories to be realised/implemented in a community-driven development scenario favouring ‘convention over configuration’?
      How is this going to be resourced?
      While OAI-PMH may have limitations, it shouldn’t be abandoned until successors/alternatives are well tested and determined to be equally robust.

    • Comment on What have we missed? on March 14th, 2017

      In all the user stories the ‘actor’ is a human or machine user of repositories – what about institutional needs/required functionality relating to their responsibilities for managing research outputs and assets? What about funding bodies, governments, industry partners, et al?

  • Mirko Hanke

    • The EU funded prototypical development of a recommender system in the EEXCESS project (eexcess.eu). If I read it correctly on their website the project has ended in 2016, but maybe the technological solutions they came up with could be of use for the development of NG repository interfaces.

  • Monica McCormick

    • My comment is on the final sentence: what do you mean by “curatable”? By whom, in what ways, where?

    • Regarding point 3 and 4: what kinds of quantitative data do you envision? From what are “review scores” derived? I expect that quality criteria and rating systems would vary widely across disciplines. Can you address how you would propose that repositories manage this variation?

    • Do you anticipate any need for private communication between an author and a reviewer? Or the possibility of anonymous reviews? If a system requires that all peer review be conducted in the open, that could create barriers for some content, some authors, and some reviewers.

    • Though you emphasize here a variety of scholarly resources, in the use cases that follow you almost universally mention only papers. I would suggest you consider this fuller range of content types (as well as scholarship in multiple media) to ensure that the use cases address the full range of needs.

  • Nick Szydlowski

    • Comment on 9. Data mining on February 7th, 2017

      I wonder if the development of value-added services needs to be its own user story/use case. As a repository manager, I want to be able to access the metadata in my (or other) repositories in real time through an API in order to build views or services on any platform using the data. This mode of access may be different than what is needed for data mining. I think the example of OAI-PMH is instructive here: the data is available for harvesting – its intended use – but the service is often much too slow to enable direct use as a data source for a web application. OAI-PMH’s harvest model, which seems to be retained in this user story, adds a lot of complexity to any service that wishes to use that data.

      If a repository adds an interface to its data that is tailored to data mining, there’s no guarantee that that service will also effectively facilitate use in web development. Though they may be closely related, I wonder if access to the metadata for the development of value-added services may need to be a separate requirement from access to the text and metadata for text mining.

  • Paul Needham

  • Peter Suber

    • Comment on 11. Comparing usage on February 7th, 2017

      Do you support some version of the (dead or dying) PIRUS standard? https://goo.gl/81HSIV

      I recommend that you consider it. Among other things it would solve this problem: when repositories host copies of the same article, they could share and sum their separate download stats, which in turn would let the author (and other users) see the overall sum, not just the sum for one of the repositories. This would (could) prevent OA repositories from becoming greedy, and trying to block harvesting from other repositories, all in the name of “protecting” their download stats from dilution by other repositories. This kind of greediness is spreading and preventable. We need to remove the obstacles to mutual harvesting between repositories, for example, funder and university repositories. Removing the incentive to “hoard” download stats is a key part of the solution.

    • Comment on 9. Data mining on February 7th, 2017

      To support data mining, repositories (and journals, and other minable resources) must move away from PDF-only texts. PDF is fine provided there are also friendlier alternatives to PDF. Are you ready to recommend that?

    • Comment on 4. Recognizing the user on February 7th, 2017

      This section seems to blend two distinct topics: opposition to anonymity (and pseudonymity), and support for standardized IDs. Both topics are important, and recommend giving them separate treatment.

      On the first: I agree that requiring users to identify themselves “can lead to constructive conversations and the creation or reinforcement of social connections.” But for users vulnerable to persecution or retaliation, only anonymity (or pseudonymity) can do this.

      On the second: You seem to endorse the use of standardized IDs for scholars. Today the best example is ORCID. But you carefully avoid naming it. Do you endorse ORCID and have reasons not to name it? (If so, what reasons?) Or are deliberately agnostic about which standardized ID we adopt? (If so, why? Do you think any new ID standard has a realistic chance to be better or more widely adopted than ORCID?)

    • There is a tension between adopting widely recognized conventions and minimizing the use of standards. Can you say more about how you’d like to strike the balance?

    • Comment on Introduction on February 7th, 2017

      You’re right that most repositories are not mobile-friendly. But in the rest of the report, you say nothing about making them mobile-friendly.

      I think you’re *not* right that most repositories are “siloed”. But in any case, you should say more about steps repositories could take to facilitate indexing by search engines (large and small, academic and non-academic).

    • Comment on What have we missed? on February 7th, 2017

      Not every institution needs to host its own repository. Many can’t afford to do it at all or to do it well. The solution to this is a consortial (shared or collaborative) repository. You should say something about the benefits of consortial repositories, and their special needs and features.

  • RCAAP Team

    • Comment on 12. Preservation on March 3rd, 2017

      Sugestion: please check VeraPDF project for PDF/A format relevance, and integrate viewers por certain type of data or datasets (possibly, this requires converting data before presentation).

    • Comment on 11. Comparing usage on March 3rd, 2017

      Please consider rely on standards, protocols (Sushi-lite and Counter?) and common rules for cleaning robots or non-human usage.

      We also think that user administrator also requires statistics from the repository. It would be nice to provide administrators the statistic info from collections of resources. Also, the process of generating statistics shouldn’t compromise the main goal of the repository. For instance, it could be done by processing logs.

    • Please consider integrate web editors with integrated workflows like http://academia.dodoc.com/
      Also, collections of resources shouldn’t be mandatory. The user or a system shouldn’t require to have knowledge of repository’s structure to make a deposit

    • The usage of standardized protocols to grant interoperability should be addressed. Native support to standard guidelines (like OpenAIRE) and common semantics (COAR, EuroCRIS, CASRAI) should also be considered.

    • Open Alternative Metrics needed;
      The behaviour described is something like ResearchGate notification system

    • Please consider: big data; open text mining; Ontologies; Also each person should define his own collection of resources.

    • Comment on 4. Recognizing the user on March 3rd, 2017

      IR should address the possibility of author profiles and claims. The option for a widely adopted ID should be considered.
      We think this implies that should be a real bound between authors and works.

    • Regarding the use of identifiers, repositories must “hide” the real URL to a specific file, using instead, a structure of URI+fileID with a reference to the landing page. This allows the repository to track usage of resources and allow the end-user to access the landing page to retrieve more services and functionalities regarding this specific work. Open Journal Systems implements from many time now this approach by embedding the file on a browser window.

    • Regarding this point, additional Guidelines for content and Technical guidelines for discovery (protocols and standards) are needed. Also, a new Metadata model to allow different use stakeholders to discover, report and analyse the repository information (researcher, author, organization, funder, government,…)
      We also think, since we are talking about web repositories, the use of metadata in HTML headers should be mandatory. For the stored assets, many formats already support metadata and data, for instance the ID3 for MP3 files, can carry the audio Title, Album, etc. The PDF, DOC, PPT, XLS formats also support metadata. A copy from the original asset could be created and filled automatically with metadata provided by the repository using special converters.

  • Rob Merkus

    • Comment on What have we missed? on February 8th, 2017

      What about a translation service of the title and abstract or on demand the full document?

  • Susan Barrett

    • Comment on What have we missed? on March 2nd, 2017

      Thank you for presenting this draft and accepting comments. I am concerned that there is no mention of accessibility in the document. Along with metadata, discovery and access is facilited for all users when online resources are fully accessible. I would encourage the COAR team to require W3C accessibility standards, for example textual equivalents for audiovisual resources (captions, descriptions and transcripts).

  • Valerie McCutcheon

  • Vicki Picasso, University of Newcastle

    • Comment on Introduction on March 14th, 2017

      The use cases appear to be comprehensive, user focused across multiple roles/needs and are inclusive. The range of use cases are obviously meant to capture all types of repositories so there are requirements that are perhaps less IR centric but relevant to the broader spectrum of the community. Inclusion of use cases on downloads, usage and standard metrics is an absolute requirement and I was pleased to see that this was included.

    • Comment on What have we missed? on March 14th, 2017

      In terms of the researcher workflow (and repository managers) and relevant use cases – researcher/author career movement often means a replication or rediscovery/duplication of existing repository content in another repository. This is of course IR centric rather than discipline archives. I didn’t feel that this was captured in the use cases but perhaps this is of less interest in the requirements.

    • Comment on 12. Preservation on March 14th, 2017

      This raises the question of whether repositories should have a similar system for archiving similar to the way that the CLOCKSS system does for published articles.

    • The key issue here will be the ability to recognise updates. This functionality is already in place via Cross Mark for published articles. The DOI would seem to be the key tool here. If CrossRef have not already been included in this conversation it would be worth involving them.

    • This type of function can also be associated with services such as Impact story or Altmetrics. A fundamental decision that needs to be made is if repositories or repository networks should have a “preferred provider” that they partner with or if they should just have an open API that allows other services to build on them.

    • The need here is to be able to seamlessly interact with services such as Publons for peer review credit, Academic Karma for peer review on preprints and with overlay journals such as Discrete Analysis http://discreteanalysisjournal.com/. What is important is that not every repository has to forge these associations themselves, but that they can be negotiated at a global level.

    • The need here is to be able to seamlessly interact with services such as Publons for peer review credit, Academic Karma for peer review on preprints and with overlay journals such as Discrete Analysis http://discreteanalysisjournal.com/. What is important is that not every repository has to forge these associations themselves, but that they can be negotiated at a global level.

    • Comment on 4. Recognizing the user on March 14th, 2017

      Is there a reason here why you would not explicitly mention ORCID? Obviously there are still publisher specific IDs being used but increasingly it would seem important to coalesce around just one identifier

    • Comment on 3. Discovering usage rights on March 14th, 2017

      The use of licenses other than CC has been hugely detrimental to this (such as publisher specific licenses eg https://www.elsevier.com/about/company-information/policies/open-access-licenses) . I’d like to see a statement that indicates that these should not be used

    • I am not clear exactly what is being required here – is there a need for a new mechanism or can this issue be handled by resolving to DOIs?

    • I completely agree that the metadata are a vital infrastructure – in fact the key infrastructure that ties all of the scholarly record together. In our view the lack of uniform metadata for most items in repositories and its inconsistent application across repositories is one of the fundamental reasons why there is poor visibility of repositories. This is a specific focus for us in Australasia this year

    • Comment on Introduction on March 14th, 2017

      I appreciate the point that metadata have to be uniform for machines but would suggest that there may need to be more discussion of the point that a human’s ability to be flexible allows them to easily overcome diverse user interfaces. I think, especially internationally, we can’t expect that users will tolerate a multitude of interfaces. They also are increasingly unwilling to tolerate interfaces that are not as nimble as what they use in other areas of their life. Increasingly, being able to access resources from places other than a desktop has to be a priority if repositories are to remain relevant

    • Comment on What have we missed? on March 14th, 2017

      A key role for repositories has to be able to reach out beyond the scholarly community. We know that many organisations and individuals outside of academia use repository content. One possible role for the future would seem to be the need to provide a level of interpretation of content for non specialist users.

      Repositories have a role to play in the academic credit system and incentive structure that is not currently addressed and which is currently dominated by publication in journals with specific impact factors. A way of getting integrated into the the incentive structure is needed to ensure that repositories become part of the scholarly workflow

      I didn’t see a discussion of the need to handle the many different types of outputs that are now recognised as being scholarly work. Innovating in handling these outputs would seem to be an essential function and a use case could be built around the integration of different types of content with each other in repositories

Source: http://comment.coar-repositories.org/comments-by-commenter/