2nd International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment
April 26th, 2022 — Online
co-located with The Web Conf 2022
In the last decades, we have experienced a substantial increase in the volume of published scientific articles and related research objects (e.g., data sets, software packages); a trend that is expected to continue. This opens up fundamental challenges including generating large-scale machine-readable representations of scientific knowledge, making scholarly data discoverable and accessible, and designing reliable and comprehensive metrics to assess scientific impact. The main objective of Sci-K is to provide a forum for researchers and practitioners from different disciplines to present, educate from, and guide research related to scientific knowledge. Specifically, we foresee three main themes that cover the most important challenges in the field: representation, discoverability, and assessment.
There is an urge for flexible, context-sensitive, fine-grained, and machine-actionable representations of scholarly knowledge that at the same time are structured, interlinked, and semantically rich. Scientific Knowledge Graphs (SKGs) are becoming ... increasingly popular as infrastructures for representing scholarly knowledge. They are large networks describing the actors (e.g., authors, organisations), documents (e.g., publications, patents), ancillary material (e.g., research data, software), contextual information (e.g., projects, fundings), and research knowledge (e.g., research topics, tasks, technologies) in this space as well as their reciprocal relationships. These resources provide substantial benefits to researchers, companies, and policymakers by powering several data-driven services for navigating, analysing, and making sense of research dynamics. Some SKGs examples include Microsoft Academic Graph (MAG), AMiner, Open Academic Graph, ScholarlyData.org, Semantic Scholar, PID Graph, Open Research Knowledge Graph, OpenCitations, and the OpenAIRE research graph. Regarding this aspect, the main challenge is related to the design of ontologies able to conceptualise scholarly knowledge, model its representation, and enable its exchange across different SKGs.
It is important that scholarly information is easily findable, discoverable, and visible, so that it can be mined and organised within SKGs. To this end, we need discovery tools able to crawl the Web and identify scholarly data, whether on a publisher’s ...website or elsewhere – institutional repositories, pre-print servers, open-access repositories, and others. This is a particularly challenging endeavour because it requires a deep understanding of both the scholarly communication landscape and the needs of a variety of stakeholders: researchers, publishers, funders, and the general public. Typically, in addition to the journal landing page, a paper's open version, pieces of software as well as data sets are often shared via alternative channels that are disconnected from the journal page. Currently, this is a major obstacle that the community of practitioners is facing for creating comprehensive knowledge graphs. In brief, the challenges are related to the discovery and extraction of entities and concepts, integration of information from heterogeneous sources, identification of duplicates, finding connections between entities, and identifying conceptual inconsistencies.
Due to the continuous growth in the volume of research output, rigorous approaches for the assessment of research impact are now more valuable than ever. In this context, we urge reliable and comprehensive metrics and indicators of the scientific impact ...and merit of publications, data sets, research institutions, individual researchers, and other relevant entities. Scientific impact refers to the attention a research work receives inside its respective and related disciplines, the social/mass media, etc. Scientific merit, on the other hand, relates to the quality aspects of a work, such as its novelty, reproducibility, FAIR-ness, and readability. Nowadays, due to the growing popularity of Open Science initiatives, a large number of useful science-related data sets have been made openly available, paving the way for the synthesis of more sophisticated indicators of scientific impact and merit and, consequently, more rigorous research assessment. For instance, in recent years, we observed a surge of large SKGs, which providevery rich and relatively clean sources of information about academics, their publications and relevant metadata. These SKGs can be used for the development of novel research assessment approaches.
Sci-K is calling for high-quality submissions around the three main themes of research related to scientific knowledge: representation, discoverability, and assessment.
Topics of interest include, but are not limited to:
Abstract: OpenAlex is a comprehensive, open index of scholarly metadata, structured as a heterogeneous graph. It contains information describing approximately 200M scholarly works, drawn from both structured (eg: Crossref) and unstructured (eg: institutional repositories, publisher websites) sources, and clustered/merged into distinct records, and linked by citations. By parsing work metadata and enriching it with external PID sources (ROR, ORCID, ISSN Network, PubMed, Wikidata, etc), OpenAlex is able to also describe (approximately) 200M author clusters, 100k institutions, and100k venues (journals and repositories). Using a neural-net classifier, we assign one or more of 50k Wikidata concepts to each work. The tool is built on a completely open source codebase, and data is freely available via a high-performance API, a complete database dump, and a search-engine-style web interface. This talk will describe the history of OpenAlex, challenges encountered in our accelerated development timeline, and plans for the future.
Allen Institute for AI
Abstract: The Semantic Scholar Academic Graph, or S2AG (pronounced "stag"), is a large, open, heterogeneous knowledge graph of scholarly works, authors, and citations that powers the Semantic Scholar discovery service. S2AG currently contains over 205M publications, 195M authors, and nearly 2.5B citation edges. Semantic Scholar integrates metadata from Crossref, PubMed, Unpaywall, and other sources. In addition, through partnerships with academic publishers and through web-crawling, we source and process the full-text of nearly 60M full-text publications in order to extract and classify the document structure, including references, citation contexts, figures, tables, and more. S2AG is available via an open API as well as via downloadable monthly snapshots. In this talk, we will describe the S2AG resource as well as the Semantic Scholar Open Research Corpus (S2ORC), a general purpose, multi-domain corpus for NLP and text mining research.
SCI-K 2022 papers available within the Companion Proceedings of the Web Conference 2022 which can be read from here.
A single video for the whole workshop.
|-||Workshop Opening - Welcome|
|-||Data models for annotating biomedical scholarly publications: the case of CORD-19. Houcemeddine Turki, Mohamed Ali Hadj Taieb, Alejandro Piad-Morffis, Mohamed Ben Aouicha and René Fabrice Bile. (Representation - Long paper)|
|-||Quantifying the topic disparity of scientific articles. Munjung Kim, Jisung Yoon, Woo-Sung Jung and Hyunuk Kim. (Assessment - Short paper)|
|-||Personal Research Knowledge Graphs. Prantika Chakraborty, Sudakshina Dutta and Debarshi Kumar Sanyal. (Representation - Vision paper)|
|-||Sequence-based extractive summarisation for Scientific Articles. Daniel Kershaw and Rob Koeling. (Discovery - Long paper)|
|-||Assessing Network Representations for Identifying Interdisciplinarity. Eoghan Cunningham and Derek Greene. (Representation - Short paper)|
|-||GraphCite: Citation Intent Classification in Scientific Publications via Graph Embeddings. Dan Berrebbi, Nicolas Huynh and Oana Balalau. (Discovery - Short paper)|
|-||Examining the ORKG towards Representation of Control Theoretic Knowledge – Preliminary Experiences and Conclusions. Carsten Knoll. (Representation - Long paper)|
|-||SciNoBo: A Hierarchical Multi-Label Classifier of Scientific Publications. Nikolaοs Gialitsis, Sotiris Kotitsas and Haris Papageorgiou. (Discovery - Long paper)|
|-||Beyond reproduction, experiments want to be understood. Jérôme Euzenat. (Assessment - Vision paper)|
|-||Semi-automated Literature Review for Scientific Assessment of Socioeconomic Climate Change Scenarios. Vanessa Schweizer, Jude Kurniawan and Aidan Power. (Assessment - Long paper)|
|-||A Study of Computational Reproducibility using URLs Linking to Open Access Datasets and Software. Lamia Salsabil, Jian Wu, Muntabir Hasan Choudhury, William A. Ingram, Edward A. Fox, Sarah M. Rajtmajer and C. Lee Giles. (Assessment - Short paper)|
|-||Keynote by Jason Priem - OpenAlex: An open and comprehensive index of scholarly works, citations, authors, institutions, and more|
|-||Keynote by Alex Wade - The Semantic Scholar Academic Graph (S2AG)|
|-||Panel on "What’s next after Microsoft Academic Graph?" - Alex Wade, Jason Priem, Natalia Manola|
February 7th, 2022 (23:59, AoE timezone)
February 1st, 2022
March 3rd, 2022 (23:59, AoE timezone)
March 1st, 2022
March 20th, 2022 (hard deadline)
March 10th, 2022
April 26th, 2022
Submissions are welcome in the following categories:
The workshop calls for full research papers (up to 8 pages + 2 pages of appendices + 2 pages of references), describing original work on the listed topics, and short papers (up to 4 pages + 2 pages of appendices + 2 pages of references), on early research results, new results on previously published works, demos, and projects. In accordance with Open Science principles, research papers may also be in the form of data papers and software papers (short or long papers). The former present the motivation and methodology behind the creation of data sets that are of value to the community; e.g., annotated corpora, benchmark collections, training sets. The latter present software functionality, its value for the community, and its application to a non-specialist reader. To enable reproducibility and peer-review, authors will be requested to share the DOIs of the data sets and the software products described in the articles and thoroughly describe their construction and reuse.
The workshop will also call for vision/position papers (up to 4 pages + 2 pages of appendices + 2 pages of references) providing insights towards new or emerging areas, innovative or risky approaches, or emerging applications that will require extensions to the state of the art. These do not have to include results already, but should carefully elaborate about the motivation and the ongoing challenges of the described area.
Submissions must adhere to the ACM template and format. For Latex submissions, use Master Article Template – LaTeX, and choose the “sigconf” option. (Read more about LaTeX documentation and ACM LaTeX best practices.) For Microsoft Word submissions, use the Interim Layout document. (Read more about interim sample pdf.) Submissions for review must be in PDF format. Submissions must be self-contained and in English. Submissions that do not follow these guidelines, or do not view or print properly, may be rejected without review. Authors are responsible for ensuring that submissions adhere strictly to the required format.
The proceedings of the workshops will be published jointly with The Web Conference 2022 proceedings.
Submit your contributions to Sci-K 2022 Easychair page: https://easychair.org/conferences/?conf=scik2022
Co-chairs for Sci-K 2022 (alphabetically)