John Sherer, Spangler Family Director, University of North Carolina Press
Abstract: ‘From Suspicion to Sustainability: How open access can save the humanities monograph’
The economics of monograph publishing have never been more challenging, and yet with new digital tools and platforms, this should be a golden age for university press publications. The University of North Carolina Press is leading the Mellon-funded Sustainable History Monograph Pilot to develop a web-based, standardized workflow for the production of open digital editions of high-quality monographs. Can the benefits of lower costs and higher impacts be combined to justify a new funding model for the monograph? This pilot—with over 20 participating presses publishing up to 150 books—will be making that case.
Charles Watkinson, Director, University of Michigan Press
Absract: ‘From Enhanced Ebook to Interactive Scholarly Works: Ready or Not, Here They Come!’
Academic ebook publishers are increasingly encountering authors whose digitally-enabled research challenges the limits of the print facsimile ebook we have become accustomed to. The majority of these scholars seek to create enhanced ebooks that are familiar in form but may integrate multimedia with text or enable commenting and annotation. A growing minority is developing interactive scholarly works that explode the boundaries of the codex. This presentation explores trends in academic ebook innovation and highlights gaps in the publishing infrastructure currently available to support their publication; from peer review and production to preservation and discovery.
Ros Pyne, Director, Open Access Books, Springer Nature
Absract: ‘From Enhanced Ebook to Interactive Scholarly Works: Ready or Not, Here They Come!’
Earlier this year Springer Nature published the first machine-generated research book, providing an overview of the latest research in the rapidly growing field of lithium-ion batteries. The talk will explain how we use cross-corpus auto-summarization of a large number of current research articles to help students and researchers managing the information overload and what are our future plans.
[00:00:16.53] AUDIENCE: Hello.
[00:00:17.23] CHRISTINE TULLEY: Yay, it works. Welcome to the session on the future of the book. We’re going to go ahead and get started. I know we’re a little bit behind time. And we want to make sure we have time for our speakers to present. So we’ll just get right into it.
[00:00:29.40] We’re first going to have Charles Watkinson. He’s Director of University Michigan Press. And he’ll be talking to us about trends in enhanced ebooks and interactive scholarly work.
[00:00:38.05] So we’ll start with that followed by John Sherer at Spangler Fa– the Spangler Family Director at University of North Carolina Press. He’ll be talking to us about how open access can save the humanities monograph. And that’s something I’m personally interested in as an English professor. So we’ll hear that.
[00:00:53.61] And that will be followed by Ros Pyne talking about– she’s the Director of Open Access Books at Springer Nature. And she’ll be talking about the first machine-generated research book. So we’re going to go ahead and get started. So, Charles.
[00:01:05.75] CHARLES WATKINSON: Thank you. Oops, saying that’s Ros. Let’s see. So Ros, I’m going to talk about the first–
[00:01:16.12] [CROWD LAUGHS] [00:01:17.42] –first machine-generated. OK, we’re just fixing that one. No, I’ll try. So the first machine-generated research book, it’s really wonderful. You wouldn’t believe it
[00:01:33.04] Sorry. So I’m Charles Watkinson. I’m from the University of Michigan. And I’m going to talk about the typology of these enhanced books and what we mean by enhanced books at the moment.
[00:01:51.42] And because my mission is always to undermine John Sherer and to spoil every talk that he gives, and because I also thought we were in a different order, this is a quote that he actually puts in his presentation. I really liked it. So sorry, John.
[00:02:09.08] And this is a quote, which I think in 2018, it reflects the feeling that many publishers have had about books and ebooks, the idea that we’re stuck in a world of the print facsimile ebook and none of the affordances of digital are really being used. But I’m going to propose that that is really changing, that we’re on the edge of really quite a wave. And we’re not ready.
[00:02:40.70] So I find this a very helpful typology. This is from an article and a report called “The Future of the Monograph in the Digital Era,” which was sponsored by the Andrew W Mellon Foundation and the lead PI was Michael Elliot from Emory University. And this was published Open Access in the Journal of Electronic Publishing. And he’s got this typology which is based on talking deeply to faculty at Emory over a period of six months, meeting and talking about what faculty want, and how they’re seeing the book dividing up into different forms. And all of– and it’s organized on a spectrum of innovation.
[00:03:22.41] So at one end, you have an open ebook, the innovation is that it’s open. And that permits some new kind of work such as open annotation, open peer review, et cetera. But otherwise, it just looks like a regular book.
[00:03:35.21] Then you have digitally-enhanced ebooks that may have links, may have embedded multimedia. The thing about enhanced ebooks, though, is you could produce a print copy. So it could be produced as a print facsimile. It’s not quite as rich as the enhanced book, but it’s possible.
[00:03:51.26] And then you have interactive scholarly works. And those are multi-modal works that couldn’t exist in a print environment. And that’s really the bleeding edge. And the place that we’re sitting at the moment I think is more around the enhanced ebooks. But we’re starting to see more and more of these interactive scholarly works that really explode the codex, explode the boundaries of the book. So I think this is a useful typology.
[00:04:15.69] Back in 2013, 2014 the Mellon Foundation either funded or incentivized a whole lot of projects which were all about trying to help faculty members get beyond the book as their publication format. And the recognition here is that scholars are all digital scholars now, and they’re all using digital tools. So even if you are the most crusty of humanists, you are taking into the archives of digital camera. And you are amassing a trove of digital images during your research.
[00:04:54.79] But many more scholars are using more advanced tools. They’re doing 3D modeling. They’re doing GIS. They’re creating interactive maps.
[00:05:04.54] But all those scholars are faced by the fact that at the point of publication, they have to strip away that richness and not– and just put things into the confines of a 2000-year-old technology. So Mellon has recognized that. And these projects were to actually change the options for scholars as they worked with digital tools, change the publishing options.
[00:05:28.48] And what’s resulted five years on is quite an interesting kind of world, an ecosystem of tools which can go all the way through from authoring and editing right the way through the publishing process, editoria, right the way through preservation, access, et cetera. And this is a recent report also funded by Mellon to assess where they had got to with their funding, published with the support of MIT Press, John Maxwell and others. So we’re now at a point where we’re ready in many ways as a publishing industry to really do this work for scholars.
[00:06:10.38] So, Fulcrum is one of the projects that was funded. It’s at University of Michigan funded back in 2013, fulcrum.org. And I just wanted to give a little case study here.
[00:06:24.23] So again, thinking back to the typology, fulcrum has on it open ebooks. The innovation here is not extremely obvious. But there are things like a building-controlled vocabulary for the peer review process that this book has gone through, which is really adapted more for the Humanities.
[00:06:45.64] There are, of course, integration with tools like Altmetric to talk about impact. But otherwise, this is a fairly regular old book. There’s annotation switched on and things like that. But there’s nothing particularly fancy about this.
[00:07:00.03] The enhanced ebooks that we’re now doing, this is and ePub Reader with a player embedded called Able Player. And that player is rendering a multimedia file from a repository layer. And this plays within the experience of the reader of this book, which is a book about film.
[00:07:23.05] So really the fact that a scholar couldn’t see the video being referred to and couldn’t play the video being referred to made this a very substandard way of delivering the argument that this author wanted to make. So just to say the names of those different components, the reason I’m saying those is these are all modular components created through various initiatives, funding initiatives bolted together into providing a particular experience. And they’re will open source products.
[00:07:54.92] And then interactive scholarly work, this is an example of an archaeological report on fulcrum. And this is one which a reader can actually approach through three different mechanisms. So it’s absolutely true that you may have a social historian of Italy, ancient Italy, who comes in through the reader narrative layer because they’re interested in doing a fairly traditional kind of synthesis of that subject across multiple books. And that’s what they want. So that’s still supported.
[00:08:26.24] But it also may be somebody who’s interested in gendered space in Roman villas, in Roman houses. And that person comes in through an interactive 3D model. So that’s a 3D model that you can manipulate.
[00:08:38.81] Or it may be somebody interested in spoons. That archaeologists comes in through a database. And whichever route they’ve come in through, they can then interact with every other tool in the publication. And archeology is a really interesting subject here because it always was frustrated as a discipline by the confines of the book. And now archaeologists are able to find their natural form in these kind of publications.
[00:09:05.45] But these kind of publications pose real challenges to us as publishers and to us as a publishing system. And you can immediately see some of those from your own experience. So, how can we make all the components of this work discoverable by the reader? It’s not enough to make the work itself discoverable because you need to make the different items, the different components of this work discoverable.
[00:09:30.39] What about peer review? How do we actually peer review a work which has so many different components without building it first? So this is this chicken-and-egg situation. How do you peer review something like this without having to build the whole thing with the risk that then it will be rejected?
[00:09:48.23] And how do you prepare and train peer reviewers to be able to assess work like this from a disciplinary as well as a technical perspective? Because it’s really important that this is not just technology for the sake of technology. It has to be assessed as whether it really advances the discipline of archeology. And a huge one very, very big in archeology, since excavation is destruction, the preservation, the publication act is really a sacred act for archaeologists is, how do we preserve this kind of work?
[00:10:19.39] So, it’s a problem for the whole system because none of the players in this system are ready for this kind of work. And there are so many of the players that are familiar to us who need to work on the discovery, recognition, and preservation of this kind of work. And there’s so much to do to prepare for this.
[00:10:39.02] And just to say that I think some of that work is starting to happen. But it’s happening way too slowly. And we have especially slipped up on areas like preservation of some of these works. And we need to get out ahead of those problems before this is a wave that overwhelms us.
[00:10:53.87] So just a couple of grants here. One is looking at the preservation of these enhanced ebooks with Portico and clocks. Another one is looking at how to measure the impact of these publications in a way that respects the intent of the academics in this field, so a way that is not imported from the sciences but actually deeply appreciates the underlying goals of the Humanities. Thank you.
[00:11:18.56] [APPLAUSE] [00:11:29.72] JOHN SHERER: I’ll ask your forgiveness in advance. I speak very quickly. And I have a lot of slides. But I’m going to bet that a me barking at you in a staccato-like voice for eight minutes, nine minutes is just what you need. As if you’re like me, you might be flagging in the afternoon.
[00:11:43.57] But I do start with a story. 20 years ago, I was working in a bookstore in Washington, DC. And these executives in very nice suits from publishing houses in New York came down. And they said, John, these bookshelf look great. And these books look fantastic. But you’re going to want to get rid of them all because we have this great new product you’re going to want to sell. And that’s called a CD-ROM.
[00:11:59.84] So we’ve always been a little dubious about technology. And we’ve never thought that the book could ever be replaced by something that looked like this. So 10 years later, we see this, and we think, OK, we’ve seen this before. This is going to go away really fast. It looks kind of bad.
[00:12:13.26] Something else is happened in 2007, though. The economy’s cratered. The stock market has crashed. And there’s a market for products with lower price points.
[00:12:21.54] And so initially, publishers were actually pretty excited about this shift to digital. You see this hockey stick growth with very high margins. And we think we’re sliding towards an all-digital marketplace. So we’re happy as publishers.
[00:12:32.64] Strangely, outside prognosticators, they see a vulnerability. And they really beat up on publishers because it was a historically unique time. The combination of the economic collapse and the digital shift.
[00:12:44.68] Why am I starting here if I’m supposed to be talking about open digital monographs? I start here because a model for humanities monographs relies on a deep understanding of how scholars embrace both digital and print formats in a process of discoverability and use. And we need to clarify how people use digital content. The history has not been well told.
[00:13:02.56] For one thing, none of these prognostications came true. Most of us misunderstood the shift that was happening. We thought we were doing digital– that digital transactions would replace print on a one-to-one basis. We thought we were going from vinyl to CD. But we were going from vinyl to Spotify.
[00:13:17.19] It matters because people don’t care about owning digital, like photos on Instagram. If it’s out in the cloud, that’s good enough. It’s now indisputable that digital sales are in decline. This stat is a little bit old, but it’s continuing to look this way. This is happening in trade publishing and in scholarly publishing.
[00:13:32.73] And in fact, there’s Charles’ quote. Print is growing with predictions of continued growth. So we might draw the conclusion that the digital revolution failed, and that the codex has triumphed. The book is, after all, a pretty good piece of technology. And the first wave of ebooks were pretty clumsy.
[00:13:48.70] It always struck me as remarkable that these smart tech companies that develop ebooks didn’t do a better job of iterating on those first rounds of products. But was making a good ebook the actual goal of Amazon, and Google, and Apple? I’d argue that most of these companies weren’t trying to sell books.
[00:14:02.43] The only one that was trying to is Barnes & Noble. And their business is terrible. The big tech companies were playing the long game and creating a walled garden that they wanted to lure customers into.
[00:14:11.13] For example, why was Amazon going to sell millions of ebooks for $9.99 when publishers were charging them, Amazon, more than that for each sale? To say again, Amazon lost as much as $5 a transaction millions of times. Why were they doing this? It’s because they weren’t selling books. They were investing in a mysterious new service.
[00:14:29.40] They were collecting vast inventories of the most lucrative product ever to be introduced into the marketplace. They’re not selling books. They’re buying you. You’re the product on Amazon.
[00:14:37.20] They don’t make money selling an ebook. They make money selling data to publishers about what people are reading. Once they have you as a customer, they don’t need to worry about discounting as much or building a better e-reader. And let’s face it, you don’t even own your digital content. And as long as a company is charging you for access to their cloud, you’ll never own the content.
[00:14:55.52] And even the Millennials don’t seem to care about digital. They like their cracked spines and dog-eared pages. There’s this irony that textbook publishers want digital more than their customers do.
[00:15:04.73] So where does this leave us? This is the American market for you right now. This is brand new data. Print is still here, and it’s definitely where the money’s at. Recent industry stats show ebook revenues are only about 12%, 12%. So we should just forget about all this digital crap, right?
[00:15:20.10] Here’s your problem. And I have agreed to post these slides. So you’re not going to be able to absorb this. This is from Ithaka S+R.
[00:15:25.29] And it shows that in the world of scholarly research, investigations almost always originate digitally. And that first search and reference exploration, scholars prefer print– or I’m sorry, boom– scholars prefer digital. But then they pivot to print for reading. Even for light skimming, more scholars prefer print. And by the time they do deep reading, only 8% prefer digital.
[00:15:46.08] So for the specialized monograph, it’s a hybrid experience. The act of reading itself looks very familiar, the same way that your grandparents did it. But discoverability looks very different, even from just a few years ago. Now we have to disaggregate and chapterize our books. But then entire books are also being collected into large aggregations creating network environments of humanity scholarship.
[00:16:05.37] This background defines the pilot that we’re undertaking at UNC Press where we’re building a hybrid digital-first model for highly discoverable, open digital editions of specialized monographs with trailing print volumes available for deep immersive reading. Put another way, it’s a model that invests in digital and marginalizes the bespoke formats we’ve conventionally focused on at the university press level.
[00:16:24.63] You need a little background on why UNC Press is doing this. Five years ago at UNC Press, we received a million-dollar grant– part of this bucket of grants that Charles mentioned from Mellon to expand the publishing services platform we all called Longleaf services, which used to be a traditional customer service fulfillment platform for six presses. But with the grant, we expanded it to a broad range of scaled back end publishing services doing all sorts of things like sales, building websites, printing, managing metadata, exhibits, and even copyright– or copy editing and design for presses.
[00:16:52.62] And then last year, we went back to Mellon and ask them for another million dollars to use Longleaf as a common platform to try this new digital-first model where publishing activities are uncoupled and standardized. We’ve branded this the “sustainable history monograph pilot.” And we have 22– we actually have 23 presses. Another one got added. Indiana has joined us since I made the slide.
[00:17:10.92] In the new workflow, we want participating presses to acquire and credential a book in the exact same way that they’ve always done. So that’s stage one. But then they turn it over to us at Longleaf essentially at the moment that you would begin copy editing. We’ll handle that as well as the composition and format preparation.
[00:17:25.92] But it’s done in a standardized way using a web-based tool, which will dramatically accelerate the traditional publishing timelines. We’ll release the ebook. And then three months later, the print book becomes available. That’s stage two.
[00:17:36.33] By uncoupling the different stages of publishing and by publishing in digital-first forms, you can drive down costs, and you can separate activities that have to be done at an individual press versus those that can be done at scale through Longleaf. After the book is published using stages one and two, a press can evaluate the impact the book is having, and then after 12 months exercise an option to republish it in a more traditional and bespoke way. We have funding to do between 75 and 125 monographs in the discipline of history in this program.
[00:18:02.59] Presses are reluctant to do this. But we’ll be buying them off with Mellon money. We reimburse them for the cost of acquiring the book. And we pay for copy editing and typesetting. The Godfather line is “an offer they can’t refuse.”
[00:18:14.32] But the biggest obstacle will be authors, who, of course, don’t buy, review, or give awards to their own books. They might want other books to be digital and open access. But theirs is going to be beautiful. So we have made a positive case for authors.
[00:18:28.75] Oh, and one more thing, since we’re dealing with historians, they’re the worst. So historians won’t like our web-based workflows. They won’t like the standardized feature. And they won’t like trailing print. But the hypothesis is that we have an affirmative case that we’ll convince them.
[00:18:40.99] First of all there, will be unparalleled access to scholarship. There’s new JStar research that shows that there’s a 16-times increase of access– or of usage of open digital scholarship over a regular paywall content, including high levels of usage in international and underserved geographies. Our workflows include digital-first features like chapel-level metadata, DOIs, and orchids.
[00:19:01.69] Authors will be provided with quantitative, standardized usage, and impact data if Charles can get his grant done quickly. And they’ll be also given qualitative and customized user surveys indicating how books were discovered and used. Manuscripts in the program will be available in six months faster than print-centric workflows.
[00:19:22.33] But the fundamental reassessment scholarly publishers, university presses need to accept is the unalloyed benefit to open digital access. In fact, we need to double down on free digital as something not to be feared but as a key part of discoverability. And fortunately, as I’ve suggested, books are different. For deep, immersive reading, the kind of reading that people do for University Press books, people want print. So giving away digital not only doesn’t cannibalize print, it helps people discover content they otherwise never would have found and creates new customers for print. OA goes from being a threat to a solution.
[00:19:55.47] So the question I always get is, how will you pay for this? Fair enough. This is not a funding initiative. Mellon has made it clear that they are interested in supporting the experimentation.
[00:20:03.63] But they’re not in the long-term business of subsidizing monographs. They think institutions need to get back in the game. And I agree. But we as publishers have to make the case. We have to convince institutions that we’re publishing more efficiently while having stronger impact.
[00:20:16.70] So this is also from Ithaka S+R. It’s from a report they produced a few years ago about the cost to publish a monograph. It’s not pretty. And this is a meat-and-potatoes 90,000-word and six-illustrations monograph. This is not a complicated one.
[00:20:29.96] So at UNC Press, these are our numbers in here. It costs us about $30,000 to $40,000 to get the book ready. This is the first copy costs.
[00:20:37.95] And then we’re going to generate about $15,000 to $20,000– or I’m sorry, $15,000 to $20,000 in sales. Which means every time I sign a contract for a monograph, I’m signing the $15,000 to $20,000 deficit for the press. So this is why a number of open access initiatives, including knowledge and latched in the somewhat ironically named Tome initiative has subsidies in the $15,000 level for open access.
[00:20:58.92] But I’ve argued that I think that’s flawed thinking. That’s a nice subsidy to a press on the rationale that publishing is hard, which it is. And it’s expensive, which it definitely is, and that you should just give us more money and say thank you.
[00:21:09.49] But our revenue on a monograph is modest, sometimes as low, as I said, it’s $15,000. The digital revenue is almost too small to measure, a few thousand dollars on a good day and more likely a few hundred dollars. We have to stop thinking of monograph publishing has cost recovery and activity and start thinking of it as a fee for service funded by institutions. And that’s why the uncoupling model is so important.
[00:21:31.18] What are the things worth an institution subsidizing versus a press subsidizing? Good peer review? I would say the institutions should subsidize that. A $500 license to put a beautiful cover on a jacket, the press should subsidize that. Good copy edit? I would say the institution could do that. An add in the New York Review of Books, definitely the press should be doing that.
[00:21:49.73] If we can change the behaviors that university presses by using a more standardized workflow that meaningfully reduce costs while optimizing digital discoverability, a per-book fee that went a long way to covering these essential activities and provide a hedge to a press for a potential loss of e-sales, and potentially even some print sales, might look more like $5,000 to $7,500. That’s what we’re trying to prove with this pilot. That’s approaching an APC.
[00:22:12.95] So think about that at scale. By some estimates, there are around 4,000 new University Press monographs published a year. At $7,500 a book, you get to $30 million super fast. That’s a lot of money. Except right now, libraries are spending tens of millions of dollars to acquire only a fraction of what university presses produce.
[00:22:31.36] Could you see a new model where there’s 1,000 libraries in the world that might pay amounts like this in order to support a subscription model? That’s a decent amount of money. But in addition to getting 4,000 new books a year, the libraries injecting much needed steady revenue to university presses.
[00:22:45.72] It’s supporting scholarship. It’s supporting public engagement. It’s supporting the humanities. It’s supporting books, and facts, and life on Earth.
[00:22:52.73] University presses need to be willing to change their practices. But the truth is, the whole ecosystem that we operate in needs to change, publishers but also authors, libraries, and institutions. If these sustainable human– sustainable history monograph pilot books don’t get reviewed, if they don’t win awards, if they don’t help authors get tenure, then it’s game over.
[00:23:12.11] Libraries need to rethink purchasing not as something driven by patron demand, a model that tends to surface scholarship that is already well known, but as true collection development. And institutions need to provide the leadership and the very modest funding to ensure a strong future for humanities publishing. Thank you.
[00:23:27.00] [APPLAUSE] [00:23:37.87] ROS PYNE: Hi everyone. I’m Ros Pyne. I’m director of Open Access Books at Springer Nature. And I’m going to tell you about some adventures in machine-generated books at Springer Nature.
[00:23:46.72] So, we’ve set ourselves the challenge to publish the first ever machine-generated scientific book. We’ve seen the rise of automated text generation in popular fiction with quite diverse and interesting results. We’ve got used to automated journalism, such as in sports, auto-generated weather forecasts or stock market reports. And it’s been remarkable progress in dialogue systems. So think of chat box and smart speakers, but no scientific book yet.
[00:24:17.59] We want to generate new scientific knowledge and value on the basis of recombining existing knowledge. And it’s important to say that this happens under full disclosure. We want to be fully transparent about what we’re doing and avoid any proximity to fake content.
[00:24:34.24] The auto-generated content is explicitly identified as such. And it’s not disguised as put together by a human. And here it is. It’s a book in chemistry with the title Lithium-Ion Batteries, A Machine-Generated Summary Of Current Research. So let me tell you a bit more about this book.
[00:24:52.66] It’s the first scientific book generated by an algorithm on the basis of a recombined accumulation and summarization of content in a given subject area. So to get into some technical details, we’re talking about a cross-corpora summarization of existing texts. It’s organized into a coherent, similarity-based sequence. And that’s on the basis of things like keyword overlap, vectorization, publication date, citations. So we’re crunching down a huge set of papers into a reasonably short book that allows the reader to speed up the process of digestion instead of reading through tons of different articles and books.
[00:25:28.52] And, of course, it’s a prototype. So the book was released and shipped in April. And the ebook is available for free download on Springer Link. And so after chemistry, we are considering looking at a machine-generated books in other areas. We’d like to go into the Humanities, and the Social Sciences, and into interdisciplinary areas as well.
[00:25:47.25] So, how does it work? And I’m going to get into even more techniques here. But hopefully– stay with me. And the other thing I want to address here is, where is human involvement necessary?
[00:26:00.30] So, first of all, I needed to select an area of potential interest, so say, lithium-ion batteries? So then the algorithm pipeline retrieves relevant content from a given repository, such as our platform Springer Link. And for the first book, we chose to use only content from Springer Link, which did make things a little bit easier.
[00:26:20.71] And then the algorithm automatically clusters the field. And it generates a chapter structure and a subchapter structure. And then it automatically summarizes the texts on the basis of extractive text summarization. And so that means it extracts the most important and meaningful sentences from the text document. And it forms a summary. And we’ve been experimenting with a neural network on top of that.
[00:26:46.73] And then it paraphrases the texts and uses the Stanford Neural Network Dependency Parser and word2vec for syntax restructuring and synonym substitution. So basically the idea of dependency parsing is to form a syntactic analysis of a sentence, extract the grammatical relations among the words in that sentence. All the quotes are referenced by hyperlinks to the underlying source document. So we’re being completely transparent about where this content has come from. And, of course, the table of contents and references are created automatically from the hyperlinks and the AI.
[00:27:23.33] So let’s come back to the question of, where is human involvement necessary? A human editor who wants to create an AI book is able to regulate a number of different parameters to get the best results, and also if you’re a publisher, the appropriate page count. So you can adjust the size of the resource, which gives you a range of granularity against ease of comprehension.
[00:27:50.43] You can adjust how important certain words are. You can break– you can break it down into a certain number of sections. And you can also specify the number of documents summarized in each section.
[00:28:04.94] But to be completely clear, we see this as a project which is not just a technology project. We want to solve a problem which is managing information overload amongst readers. And the assumption is that thinking about AI and machine-generated books is one step in the right direction here.
[00:28:24.98] And from the very start, when we first started to think about machine-generated books, we considered the task to be as much a publishing challenge as a technological one. So it’s evident from the numerous questions that we got and that we’ve had all the way along that this is something that– it’s an ongoing process. This is something– it’s the start of a discussion. And it’s 50% technology and 50% a publishing challenge.
[00:28:52.26] And indeed, half the challenge is answering those variety of often publishing-relevant questions related to auto-generated content. And I won’t have time today to delve into all of those issues, but amongst them we can include who’s the originator of the content? Who’s accountable for it? What about copyright and intellectual ownership?
[00:29:10.87] What about the review process? What does peer review mean in this field? Would a researcher consider him or herself to be a peer to a machine? Who selects what a machine is supposed to generate in the first place? And what about ethical and legal concerns?
[00:29:27.23] And also, who should answer these questions? We think we have to work together to find answers, to find common standards related to machine-generated content. And we consider creating this book as an opportunity to initiate the discussion as early as possible, discussion about future challenges, opportunities, and limitations, and discussion that is taking place between us as publishers but also with research communities and technology experts. And, of course, with this project, we want to push the boundaries in an interesting way. But we are also committed to doing it in a responsible way and in partnership with the rest of the research community.
[00:30:03.18] I want to talk a bit about who’s the audience for this type of book. So this book is essentially– it’s a meta summary. It’s the ultimate literature review. It’s designed for reviewers, academic writers, masters, PhD students, anyone active in strategic science decision making. And by providing a structured literature excerpt from a potentially very large set of papers, it’s supposed to be of help to anyone who wants to write a literature survey, anyone starting a PhD, or wanting to find a speedy way into a topic.
[00:30:32.90] Even if the excerpts are recognizable as bot-generated, even if the style is a bit clunky, it’s still helpful in speeding up the literature digestion process. We’ve done user research. We’ve shown it to students. And the feedback was that the book was really relevant, especially at the beginning of a research project, enabling readers to get an overview about what has been done already, enabling those who are writing a thesis to be aware of the relevant literature.
[00:30:58.35] I also wants to talk about who’s the author? Who’s this Beta Writer that we’ve put on the book? We don’t expect that human authors will be replaced with algorithms. On the contrary, we expect that the role of researchers and authors will continue to be critical. But it will substantially change as more and more research content, or parts of research content, is created by algorithms.
[00:31:22.86] To a degree, this development is not that different from automation in manufacturing over the previous centuries. It’s often resulted in a decrease in manufacturers and an increase in designers at the same time. Perhaps the future of scientific content creation will show a similar decrease of writers and an increase of text designers. Or as Ross Goodwin at the Google Arts and Culture said, “when we teach computers to write, the computers don’t replace us anymore than pianos replace pianists– in a certain way, they become our pens, and we become more than writers. We become writers of writers.”
[00:32:01.15] So to finish up, here’s some feedback from the media. Overall, we got a very positive reception. There were only a few articles which criticized the quality of the book. And that’s fine because it’s a prototype. And we learn from it. But even the acknowledge– the articles that were critical acknowledged the importance of the prototype as a step in the right direction. And we also got positive feedback from chemistry-specific media.
[00:32:24.82] I think the worse that we heard was it’s kind of a boring read. But, I mean, even that article admitted that we’re probably not looking for the new Game of Thrones here. And it was appreciated that this book was freely available to download so everyone can take a look, understand what we’re dealing with here in terms of the first AI book.
[00:32:42.25] And one other thing was that we didn’t with this first book polish or copy edit the book. And we had feedback that that was quite helpful because it allows people to see what in its raw form this machine-generated content looks like. And it serves as a benchmark for where the field goes in the future.
[00:32:59.24] So just to finish up, I want to note that I actually only had a minor role in this project. You’ll notice that I’m the Director of Open Access Books. And we didn’t in the end publish it open access, but we did consider it. So I’m here speaking on behalf of my colleagues, and in particular, for Henning Schoenenberger, who led this project in collaboration with a set of colleagues, including those from the Goethe University in Frankfurt. And thanks a lot for listening.
[00:33:23.22] [APPLAUSE] [00:33:29.80] CHRISTINE TULLEY: Thanks, Ros. So I’m going to use a phrase that I think Charles said about– I think you called them the “crusty humanities authors.” And I thought I might use that as a way to get us into our discussion today.
[00:33:42.24] And I am somebody from the Humanities. I’m in an English department if you didn’t hear me speak earlier. I do publish in print monographs. I’ve also done some stuff in digital journals and editing there.
[00:33:52.00] So I have two things I just want to point out and ask a little bit about. And for me, these would be questions of what is lost by these initiatives? Because I understand things like, oh, well, a person didn’t actually write this book as a concern for Ros or some of those other issues that are out there.
[00:34:06.73] But I’m just thinking about two instances specifically. So one would be, what happens to the traditional print monograph? When it goes out, it gets published, and now it’s out at a conference.
[00:34:16.68] So one of the ways where I sell a lot of copies of my monograph is at the professional academic conference. And so we all know, all the authors all know how it works where the publishers have book exhibit halls, and they have this little U shape where they present all the books. And so the best thing you can hope for as an academic author is to see your monograph in the back of the U because you’re the draw. That’s why you’re there.
[00:34:40.43] And so the first time my book was there, as I went, I went to go look, because I like, oh, phew, it’s all the way on the back wall. So I know they think it’s going to sell well because it’s going to draw people into the U to look at the other books. And so I’m thinking about that when we’re talking about some of these other notions of the way books are discovered.
[00:34:56.79] Right now in a Humanities monograph area, my sales– I’ve actually tracked some of this. I mean, I know I can’t track things on Amazon. But the press I publish with does release things out to the authors like where books are sold. And right now, a big chunk of mine are getting sold at academic conferences. So that’s just one thing I want to throw out there for discussion.
[00:35:13.59] And then the second thing, this goes a little bit with what Ros has was there’s a movement now in academic writing in the Humanities called “stylish academic writing.” Helen Sword is somebody that’s a big proponent of this. And I just wonder with things like a machine book, I don’t– as an author, I don’t have a problem because at some point the authors already got credit even though now they’re mixed together and scrambled. That would be an easy way to think about it.
[00:35:37.22] But I wonder– I know you talked about it being in the raw form. But I wonder if the quality of the writing itself then that’s lost. I mean, I am somebody that studies rhetoric, and the nuance, and the way that an argument is presented. And all those things go into it. So in a machine-oriented book, while I don’t have a problem with it the way it’s set up in that way, I wonder what’s being lost in the way that the writing actually happens.
[00:35:59.79] So, I’m going to throw that out to the three of you to fight about if you want to make some comments. And then hopefully we’ll have time for questions at the end. It looks like we’ll have a little bit. Anyone want to go?
[00:36:11.46] ROS PYNE: I guess I’ll pick on the point about style.
[00:36:14.24] CHRISTINE TULLEY: Yeah.
[00:36:15.26] ROS PYNE: So I think one interesting thing about the machine-generated books is there’s a whole spectrum of how much is a person involved and how much is the machine involved? So at the very far end, you have what we did here, which is the machine really just brought together all of these texts, it applied algorithms which made them come together in a way which is rational, but absolutely not stylish. It’s efficient, I guess.
[00:36:38.67] But you could also use this to generate a table of contents for say, a textbook. You could use it to just generate a set of understanding about what’s the latest thinking in an area. And then maybe as a result of that, you go off and you commission new research for a human-written summary of the field in that particular area. So it could be more a commissioning aid at one end. And so I think there’s absolutely still the space for style and for people to be involved.
[00:37:09.45] There could be a middle version where you have something where people would go in and they really clean up the text, and they add style to it, but it’s still sort of cited and giving credit. So I think it’s in some respects in a separate area, this is just a machine doing something clunky. And we absolutely still want the people writing beautiful prose. And on the other side, it could be a help to the people writing beautiful prose.
[00:37:30.11] CHRISTINE TULLEY: OK. Thank you. Yeah.
[00:37:32.13] CHARLES WATKINSON: I think in the world of enhanced ebooks, the thing that might be lost is any free time for faculty to do anything else.
[00:37:40.80] CHRISTINE TULLEY: Mm-hmm.
[00:37:41.28] CHARLES WATKINSON: Because one of the things that’s really troubling us is just the extra burden that we’re finding it very hard to not pass on to the author, especially because each component now requires that same– that care around permissions, and metadata, and accessibility data, and technical data because it’s all going to be exposed, and it’s all going to be a linked object. And the extra work that that involves is really substantial.
[00:38:12.30] CHRISTINE TULLEY: Can I ask a follow up question with that very quickly? About the technical aspects, are you finding this on the peer review end, just to clarify where you’re–
[00:38:20.22] CHARLES WATKINSON: Mm-hmm. This is more to do with– it’s almost like completing a giant spreadsheet. It’s post peer review. It’s at the point where you’re preparing your book for publication, your manuscript, and now there’s so much more that the author needs to add.
[00:38:36.12] I mean, accessibility is at such an important area. I was talking to Simon earlier about it, but one of the things is a really well done accessible book requires that the author really thinks about how they write and how they add alt text and described-by text. And that’s a lot of work to do it right.
[00:38:55.14] So I think that’s a particularly– this is all good stuff. It’s all important stuff. But it imposes additional burdens on the author that we can’t ignore. And these are authors who are increasingly contingent faculty, they’re not tenured, they’re fitting this in around huge teaching loads. And I think that’s a real concern.
[00:39:17.87] JOHN SHERER: I’m just reflecting on the irony that I may have made some comments disparaging authors. And within moments, Ros has proposed a model that eliminated them from the workflow. It’s like when I worked in the bookstore, I was like, this would be the best job if it just weren’t for the customers.
[00:39:30.30] So I actually think what we’re selling when we sell a book is an experience. And especially, it takes 10, 15, 20 hours to read a book. And so quality does matter. The reading experience does matter.
[00:39:41.04] I see what they’ve developed as potentially a fantastic tool for helping authors do some drafting, and organizing, and literature review. I see a lot of– much more tidiness in citations and footnotes being much cleaner and less problematic. So I it sounds like it could be a great tool to help people get a head start on a first draft.
[00:40:01.34] But I think ultimately if you want people– books sell more than 500 copies when somebody reads it and says, I found it worth 15 hours to read this. And so now Charles, you should read it or something like that. So that’s when we go from 500 copies to 2,000 copies or 5,000 copies.
[00:40:15.51] The comment about exhibits is kind of amusing because there is– I think there’s a widespread perception in the university press world that that’s on a very negative spiral right now, that fewer people go to these conferences, the exhibits get a lot more expensive, and therefore publishers can barely afford to do it. I would say for these books in our program, I would want someone to come see some sort of a visual representation of your scholarship and then on the plane home be able to read the first two chapters and hopefully be so enthralled by it that the next thing you do is either click on Amazon or tell their library to go buy it for them.
[00:40:49.24] CHRISTINE TULLEY: Thank you very much.
[00:40:50.25] JOHN SHERER: Yeah.
[00:40:50.52] CHRISTINE TULLEY: Let me open it up to other questions out here. Oh, we got one in the back. You can go.
[00:40:54.89] AUDIENCE: Hi. I’m Christina. I’m from Oxford University Press. I have a question about the machine-generated book or information, which I may find it hard to articulate. But the question is about authorship and the selection of content, I suppose.
[00:41:17.06] So my understanding is that the machine has selected all of these things based on the information that was given to it on what basis to select. Like with your little tuner thing, you can do frequency, or volume, or whatever. And I guess my concern would be– and I’m assuming this is something you’ve put some thought into as well– is you are then presenting that as lithium-ion batteries, this is all you need to know. And, of course, there’s got to be a lot of content that didn’t get in there because of the way you set the parameters.
[00:41:54.75] So I guess my question is, what is the thinking behind that? I mean, how do you– how do you modify that or acknowledge that, that this is not the be all and end all of the lithium ion batteries?
[00:42:10.38] ROS PYNE: Yeah. I mean, I think– as I said at the beginning, it’s very much about transparency, so being very clear firstly that it’s a machine-generated book, secondly making it very clear what the corpus is that is being drawn on. In some cases, as you said, the algorithm will help to determine which texts get more precedence. But a human has been involved in identifying which texts are used in the first place and in tweaking the algorithm so that it picks things up in the first place. So I think it’s about making clear where it’s come from, but also making sure that there really will be human involvement. I think for some time to come.
[00:42:48.95] And I think this question actually gets really interesting when you get into the Social Sciences and the Humanities where we have been working on these books. And there it gets much harder because you’re talking about arguments, and you’re talking about how do these arguments come together, and are you representing them accurately?
[00:43:03.73] And at the moment, it seems like we need a lot of human intervention at the end to be able to ensure that the topic is presented correctly. Now, the algorithm can obviously learn. So that could improve over time. But, I mean, I think all we can do at the moment is the utmost transparency about what has gone into making the book and be a very open about having these questions.
[00:43:28.54] AUDIENCE: Can I get one more?
[00:43:29.95] CHRISTINE TULLEY: Yes.
[00:43:31.35] AUDIENCE: I just wanted to follow that last question.
[00:43:33.23] CHRISTINE TULLEY: A student has the mic, but that’s OK.
[00:43:35.12] AUDIENCE: Sorry. I just wanted to follow that, the last question. It’s a similar theme. How do you work out who gets paid? So if you’re using excerpts of lots of different articles, and they’re pieced together in some way that’s– that’s curated by this machine, surely if I’m the author of one of those articles, actually, I’m probably due a small royalty on some of that. How does that work when you’ve got loads of articles or involved in this kind of thing?
[00:44:05.35] ROS PYNE: So the algorithm is set up so that it will never excerpt more than three continuous sentences from any one work. And it’s drawing on a very large corpus. And we imagine they always would draw on a very large corpus. So it’s not as though we’re lifting whole paragraphs from other people because it would effectively be a sort of very difficult ethical grounds. But we really are just pulling little sentences for each place and combining them in a way that is logical.
[00:44:37.67] I mentioned that all the books that we used here was Springer Nature book. So they’re all books which we have on one license. So that also made it easier.
[00:44:45.77] And I think once you start talking about much broader sets of texts that you’re drawing on, OA is an interesting possible area. But I don’t think that we’re reusing the text in such a way that royalties are due because there’s such tiny pieces of text, in the same way that for you might select a great number of different people in literature review, and then you wouldn’t expect to pay them a royalty for the fact that you’ve cited a couple of sentences of their work in your lit review.
[00:45:14.63] CHRISTINE TULLEY: Thank you. Unfortunately, we need to stop it here. So if you need to catch any of our presenters, let’s go ahead give a round of applause for a good presentation.
[00:45:23.27] [MUSIC PLAYING]