Loading
  • 21 Aug, 2019

  • By, Wikipedia

User Talk:Lacypaperclip

Hello! The Wikimedia Foundation is asking for your feedback in a survey. We want to know how well we are supporting your work on and off wiki, and how we can change or improve things in the future. The opinions you share will directly affect the current and future work of the Wikimedia Foundation. You have been randomly selected to take this survey as we would like to hear from your Wikimedia community. The survey is available in various languages and will take between 20 and 40 minutes.

Take the survey now!

You can find more information about this survey on the project page and see how your feedback helps the Wikimedia Foundation support editors like you. This survey is hosted by a third-party service and governed by this privacy statement (in English). Please visit our frequently asked questions page to find more information about this survey. If you need additional help, or if you wish to opt-out of future communications about this survey, send an email through the EmailUser feature to WMF Surveys to remove you from the list.

Thank you!

WMF Surveys, 18:25, 29 March 2018 (UTC)[reply]

Facto Post – Issue 11 – 9 April 2018

Facto Post – Issue 11 – 9 April 2018

The 100 Skins of the Onion

Open Citations Month, with its eminently guessable hashtag, is upon us. We should be utterly grateful that in the past 12 months, so much data on which papers cite which other papers has been made open, and that Wikidata is playing its part in hosting it as "cites" statements. At the time of writing, there are 15.3M Wikidata items that can do that.

Pulling back to look at open access papers in the large, though, there is is less reason for celebration. Access in theory does not yet equate to practical access. A recent LSE IMPACT blogpost puts that issue down to "heterogeneity". A useful euphemism to save us from thinking that the whole concept doesn't fall into the realm of the oxymoron.

Some home truths: aggregation is not content management, if it falls short on reusability. The PDF file format is wedded to how humans read documents, not how machines ingest them. The salami-slicer is our friend in the current downloading of open access papers, but for a better metaphor, think about skinning an onion, laboriously, 100 times with diminishing returns. There are of the order of 100 major publisher sites hosting open access papers, and the predominant offer there is still a PDF.

Red onion cross section

From the discoverability angle, Wikidata's bibliographic resources combined with the SPARQL query are superior in principle, by far, to existing keyword searches run over papers. Open access content should be managed into consistent HTML, something that is currently strenuous. The good news, such as it is, would be that much of it is already in XML. The organisational problem of removing further skins from the onion, with sensible prioritisation, is certainly not insuperable. The CORE group (the bloggers in the LSE posting) has some answers, but actually not all that is needed for the text and data mining purposes they highlight. The long tail, or in other words the onion heart when it has become fiddly beyond patience to skin, does call for a pis aller. But the real knack is to do more between the XML and the heart.


To subscribe to Facto Post go to Wikipedia:Facto Post mailing list. For the ways to unsubscribe, see below.
Editor Charles Matthews, for ContentMine. Please leave feedback for him. Back numbers are here.
Reminder: WikiFactMine pages on Wikidata are at WD:WFM.

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

MediaWiki message delivery (talk) 16:25, 9 April 2018 (UTC)[reply]

Reminder: Share your feedback in this Wikimedia survey