Sci-k 2022

About

In the last decades, we have experienced a substantial increase in the volume of published scientific articles and related research objects (e.g., data sets, software packages); a trend that is expected to continue. This opens up fundamental challenges including generating large-scale machine-readable representations of scientific knowledge, making scholarly data discoverable and accessible, and designing reliable and comprehensive metrics to assess scientific impact. The main objective of Sci-K is to provide a forum for researchers and practitioners from different disciplines to present, educate from, and guide research related to scientific knowledge. Specifically, we foresee three main themes that cover the most important challenges in the field: representation, discoverability, and assessment.

Representation

There is an urge for flexible, context-sensitive, fine-grained, and machine-actionable representations of scholarly knowledge that at the same time are structured, interlinked, and semantically rich. Scientific Knowledge Graphs (SKGs) are becoming ... increasingly popular as infrastructures for representing scholarly knowledge. They are large networks describing the actors (e.g., authors, organisations), documents (e.g., publications, patents), ancillary material (e.g., research data, software), contextual information (e.g., projects, fundings), and research knowledge (e.g., research topics, tasks, technologies) in this space as well as their reciprocal relationships. These resources provide substantial benefits to researchers, companies, and policymakers by powering several data-driven services for navigating, analysing, and making sense of research dynamics. Some SKGs examples include Microsoft Academic Graph (MAG), AMiner, Open Academic Graph, ScholarlyData.org, Semantic Scholar, PID Graph, Open Research Knowledge Graph, OpenCitations, and the OpenAIRE research graph. Regarding this aspect, the main challenge is related to the design of ontologies able to conceptualise scholarly knowledge, model its representation, and enable its exchange across different SKGs.

Discoverability

It is important that scholarly information is easily findable, discoverable, and visible, so that it can be mined and organised within SKGs. To this end, we need discovery tools able to crawl the Web and identify scholarly data, whether on a publisher’s ...website or elsewhere – institutional repositories, pre-print servers, open-access repositories, and others. This is a particularly challenging endeavour because it requires a deep understanding of both the scholarly communication landscape and the needs of a variety of stakeholders: researchers, publishers, funders, and the general public. Typically, in addition to the journal landing page, a paper's open version, pieces of software as well as data sets are often shared via alternative channels that are disconnected from the journal page. Currently, this is a major obstacle that the community of practitioners is facing for creating comprehensive knowledge graphs. In brief, the challenges are related to the discovery and extraction of entities and concepts, integration of information from heterogeneous sources, identification of duplicates, finding connections between entities, and identifying conceptual inconsistencies.

Assessment

Due to the continuous growth in the volume of research output, rigorous approaches for the assessment of research impact are now more valuable than ever. In this context, we urge reliable and comprehensive metrics and indicators of the scientific impact ...and merit of publications, data sets, research institutions, individual researchers, and other relevant entities. Scientific impact refers to the attention a research work receives inside its respective and related disciplines, the social/mass media, etc. Scientific merit, on the other hand, relates to the quality aspects of a work, such as its novelty, reproducibility, FAIR-ness, and readability. Nowadays, due to the growing popularity of Open Science initiatives, a large number of useful science-related data sets have been made openly available, paving the way for the synthesis of more sophisticated indicators of scientific impact and merit and, consequently, more rigorous research assessment. For instance, in recent years, we observed a surge of large SKGs, which providevery rich and relatively clean sources of information about academics, their publications and relevant metadata. These SKGs can be used for the development of novel research assessment approaches.

Topics

Sci-K is calling for high-quality submissions around the three main themes of research related to scientific knowledge: representation, discoverability, and assessment. Topics of interest include, but are not limited to:

Representation
- Data models for the description of scholarly data and their relationships.
- Description and use of provenance information of scientific data.
- Integration and interoperability models of different data sources.
Discoverability
- Methods for extracting metadata, entities and relationships from scientific data.
- Methods for the (semi-)automatic annotation and enhancement of scientific data.
- Methods and interfaces for the exploration, retrieval, and visualisation of scholarly data.
Assessment
- Novel methods, indicators, and metrics for quality and impact assessment of scientific publications, datasets, software, and other relevant entities based on scholarly data.
- Uses of scientific knowledge graphs and citation networks for the facilitation of research assessment.
- Studies regarding the characteristics or the evolution of scientific impact or merit.

Keynote Speakers

Jason Priem

Our Research

OpenAlex: An open and comprehensive index of scholarly works, citations, authors, institutions, and more

Abstract: OpenAlex is a comprehensive, open index of scholarly metadata, structured as a heterogeneous graph. It contains information describing approximately 200M scholarly works, drawn from both structured (eg: Crossref) and unstructured (eg: institutional repositories, publisher websites) sources, and clustered/merged into distinct records, and linked by citations. By parsing work metadata and enriching it with external PID sources (ROR, ORCID, ISSN Network, PubMed, Wikidata, etc), OpenAlex is able to also describe (approximately) 200M author clusters, 100k institutions, and100k venues (journals and repositories). Using a neural-net classifier, we assign one or more of 50k Wikidata concepts to each work. The tool is built on a completely open source codebase, and data is freely available via a high-performance API, a complete database dump, and a search-engine-style web interface. This talk will describe the history of OpenAlex, challenges encountered in our accelerated development timeline, and plans for the future.

Alex Wade

Allen Institute for AI

The Semantic Scholar Academic Graph (S2AG)

Abstract: The Semantic Scholar Academic Graph, or S2AG (pronounced "stag"), is a large, open, heterogeneous knowledge graph of scholarly works, authors, and citations that powers the Semantic Scholar discovery service. S2AG currently contains over 205M publications, 195M authors, and nearly 2.5B citation edges. Semantic Scholar integrates metadata from Crossref, PubMed, Unpaywall, and other sources. In addition, through partnerships with academic publishers and through web-crawling, we source and process the full-text of nearly 60M full-text publications in order to extract and classify the document structure, including references, citation contexts, figures, tables, and more. S2AG is available via an open API as well as via downloadable monthly snapshots. In this talk, we will describe the S2AG resource as well as the Semantic Scholar Open Research Corpus (S2ORC), a general purpose, multi-domain corpus for NLP and text mining research.

Proceedings

WWW '22: Companion Proceedings of the Web Conference 2022

SCI-K 2022 papers available within the Companion Proceedings of the Web Conference 2022 which can be read from here.

Recordings

A single video for the whole workshop.

Program

SESSION 1
-	Workshop Opening - Welcome
-	Data models for annotating biomedical scholarly publications: the case of CORD-19. Houcemeddine Turki, Mohamed Ali Hadj Taieb, Alejandro Piad-Morffis, Mohamed Ben Aouicha and René Fabrice Bile. (Representation - Long paper)
-	Quantifying the topic disparity of scientific articles. Munjung Kim, Jisung Yoon, Woo-Sung Jung and Hyunuk Kim. (Assessment - Short paper)
-	Personal Research Knowledge Graphs. Prantika Chakraborty, Sudakshina Dutta and Debarshi Kumar Sanyal. (Representation - Vision paper)
-	Sequence-based extractive summarisation for Scientific Articles. Daniel Kershaw and Rob Koeling. (Discovery - Long paper)
-	Assessing Network Representations for Identifying Interdisciplinarity. Eoghan Cunningham and Derek Greene. (Representation - Short paper)
-	Coffee Break
SESSION 2
-	GraphCite: Citation Intent Classification in Scientific Publications via Graph Embeddings. Dan Berrebbi, Nicolas Huynh and Oana Balalau. (Discovery - Short paper)
-	Examining the ORKG towards Representation of Control Theoretic Knowledge – Preliminary Experiences and Conclusions. Carsten Knoll. (Representation - Long paper)
-	SciNoBo: A Hierarchical Multi-Label Classifier of Scientific Publications. Nikolaοs Gialitsis, Sotiris Kotitsas and Haris Papageorgiou. (Discovery - Long paper)
-	Beyond reproduction, experiments want to be understood. Jérôme Euzenat. (Assessment - Vision paper)
-	Semi-automated Literature Review for Scientific Assessment of Socioeconomic Climate Change Scenarios. Vanessa Schweizer, Jude Kurniawan and Aidan Power. (Assessment - Long paper)
-	A Study of Computational Reproducibility using URLs Linking to Open Access Datasets and Software. Lamia Salsabil, Jian Wu, Muntabir Hasan Choudhury, William A. Ingram, Edward A. Fox, Sarah M. Rajtmajer and C. Lee Giles. (Assessment - Short paper)
-	Lunch Break
SESSION 3
-	Keynote by Jason Priem - OpenAlex: An open and comprehensive index of scholarly works, citations, authors, institutions, and more
-	Keynote by Alex Wade - The Semantic Scholar Academic Graph (S2AG)
-	Coffee Break
SESSION 4
-	Panel on "What’s next after Microsoft Academic Graph?" - Alex Wade, Jason Priem, Natalia Manola
-	Closing

Accepted Papers

Full Papers

Data models for annotating biomedical scholarly publications: the case of CORD-19. Houcemeddine Turki, Mohamed Ali Hadj Taieb, Alejandro Piad-Morffis, Mohamed Ben Aouicha and René Fabrice Bile; read paper
Sequence-based extractive summarisation for Scientific Articles. Daniel Kershaw and Rob Koeling; read paper
Semi-automated Literature Review for Scientific Assessment of Socioeconomic Climate Change Scenarios. Vanessa Schweizer, Jude Kurniawan and Aidan Power; read paper
SciNoBo: A Hierarchical Multi-Label Classifier of Scientific Publications. Nikolaοs Gialitsis, Sotiris Kotitsas and Haris Papageorgiou; read paper
Examining the ORKG towards Representation of Control Theoretic Knowledge – Preliminary Experiences and Conclusions. Carsten Knoll. read paper

Short Papers

Assessing Network Representations for Identifying Interdisciplinarity. Eoghan Cunningham and Derek Greene; read paper
Quantifying the topic disparity of scientific articles. Munjung Kim, Jisung Yoon, Woo-Sung Jung and Hyunuk Kim; read paper
GraphCite: Citation Intent Classification in Scientific Publications via Graph Embeddings. Dan Berrebbi, Nicolas Huynh and Oana Balalau; read paper
A Study of Computational Reproducibility using URLs Linking to Open Access Datasets and Software. Lamia Salsabil, Jian Wu, Muntabir Hasan Choudhury, William A. Ingram, Edward A. Fox, Sarah M. Rajtmajer and C. Lee Giles; read paper

Vision Papers

Personal Research Knowledge Graphs. Prantika Chakraborty, Sudakshina Dutta and Debarshi Kumar Sanyal; read paper
Beyond reproduction, experiments want to be understood. Jérôme Euzenat; read paper

Important Dates

1

Paper submission

February 7th, 2022 (23:59, AoE timezone) ~~February 1st, 2022~~

Notification of acceptance

March 3rd, 2022 (23:59, AoE timezone) ~~March 1st, 2022~~

2

3

Camera ready due

March 20th, 2022 (hard deadline) ~~March 10th, 2022~~

Workshop day

April 26th, 2022

4

Submission guidelines

Submissions are welcome in the following categories:

Full research papers: up to 8 pages + up to 2 pages of appendices (optional) + up to 2 pages of references
Short research papers: up to 4 pages + up to 2 pages of appendices (optional) + up to 2 pages of references
Vision/Position papers up to 4 pages + up to 2 pages of appendices (optional) + up to 2 pages of references

The workshop calls for full research papers (up to 8 pages + 2 pages of appendices + 2 pages of references), describing original work on the listed topics, and short papers (up to 4 pages + 2 pages of appendices + 2 pages of references), on early research results, new results on previously published works, demos, and projects. In accordance with Open Science principles, research papers may also be in the form of data papers and software papers (short or long papers). The former present the motivation and methodology behind the creation of data sets that are of value to the community; e.g., annotated corpora, benchmark collections, training sets. The latter present software functionality, its value for the community, and its application to a non-specialist reader. To enable reproducibility and peer-review, authors will be requested to share the DOIs of the data sets and the software products described in the articles and thoroughly describe their construction and reuse.

The workshop will also call for vision/position papers (up to 4 pages + 2 pages of appendices + 2 pages of references) providing insights towards new or emerging areas, innovative or risky approaches, or emerging applications that will require extensions to the state of the art. These do not have to include results already, but should carefully elaborate about the motivation and the ongoing challenges of the described area.

Submissions must adhere to the ACM template and format. For Latex submissions, use Master Article Template – LaTeX, and choose the “sigconf” option. (Read more about LaTeX documentation and ACM LaTeX best practices.) For Microsoft Word submissions, use the Interim Layout document. (Read more about interim sample pdf.) Submissions for review must be in PDF format. Submissions must be self-contained and in English. Submissions that do not follow these guidelines, or do not view or print properly, may be rejected without review. Authors are responsible for ensuring that submissions adhere strictly to the required format.

The proceedings of the workshops will be published jointly with The Web Conference 2022 proceedings.

Submit your contributions to Sci-K 2022 Easychair page: https://easychair.org/conferences/?conf=scik2022

Program Committee

Mehwish Alam (FIZ Karlsruhe, DE)
Simone Angioni (University of Cagliari, IT)
Alessia Bardi (ISTI-CNR, IT)
Alessandra Belfiore (Università degli studi della Campania "Luigi Vanvitelli", IT)
Nikos Bikakis ("Athena" RC, GR)
Russa Biswas (FIZ Karlsruhe, DE)
Davide Buscaldi (Université Paris 13, FR)
Luca D'Aniello (University of Naples Federico II, IT)
Theodore Dalamagas ("Athena" RC, GR)
Michele De Bonis (ISTI-CNR, IT)
Patricia Feeney (CrossRef, USA)
Ornella Irrera (University of Padua, IT)
Mohamad Yaser Jaradeh (L3S Research Center, Leibniz University Hannover, DE)
Ilias Kanellos ("Athena" RC, GR)
Anastasia Krithara (NCSR "Demokritos", GR)
Shubhanshu Mishra (Twitter, USA)
Allard Oelen (L3S Research Center, Leibniz University Hannover, DE)
Diego Reforgiato (Università degli studi di Cagliari, IT)
Jodi Schneider (University of Illinois Urbana Champaign, USA)
Dimitrios Skoutas ("Athena" RC, GR)
Christos Tryfonopoulos (University of the Peloponnese, GR)
Giannis Tsakonas (University of Patras, GR)
Sahar Vahdati (InfAI, DE)