Sci-K 2023

3rd International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment
30 April 2023
co-located with The Web Conf 2023


In the last decades, we have experienced a substantial increase in the volume of published scientific articles and related research objects (e.g., data sets, software packages); a trend that is expected to continue. This opens up fundamental challenges including generating large-scale machine-readable representations of scientific knowledge, making scholarly data discoverable and accessible, and designing reliable, comprehensive, and equitable metrics to assess scientific impact. The main objective of Sci-K is to provide a forum for researchers and practitioners from different disciplines to present, educate, and guide research related to scientific knowledge. Specifically, we foresee three main themes that cover the most important challenges in this field: representation, discoverability, and assessment.


There is an urge for flexible, context-sensitive, fine-grained, and machine-actionable representations of scholarly knowledge that at the same time are structured, interlinked, and semantically rich. Scientific Knowledge Graphs (SKGs) are becoming ... increasingly popular as infrastructures for representing scholarly knowledge. They are large networks describing the actors (e.g., authors, organisations), documents (e.g., publications, patents), ancillary material (e.g., research data, software), contextual information (e.g., projects, fundings), and research knowledge (e.g., research topics, tasks, technologies) in this space as well as their reciprocal relationships. These resources provide substantial benefits to researchers, companies, and policymakers by powering several data-driven services for navigating, analysing, and making sense of research dynamics. Some SKGs examples include Microsoft Academic Graph (MAG), AMiner, Open Academic Graph,, Semantic Scholar, PID Graph, Open Research Knowledge Graph, OpenCitations, and the OpenAIRE research graph. Regarding this aspect, the main challenge is related to the design of ontologies able to conceptualise scholarly knowledge, model its representation, and enable its exchange across different SKGs.

It is important that scholarly information is easily findable, discoverable, and visible, so that it can be mined and organised within SKGs. To this end, we need discovery tools able to crawl the Web and identify scholarly data, whether on a publisher’s or elsewhere – institutional repositories, pre-print servers, open-access repositories, and others. This is a particularly challenging endeavour because it requires a deep understanding of both the scholarly communication landscape and the needs of a variety of stakeholders: researchers, publishers, funders, and the general public. Typically, in addition to the journal landing page, a paper's open version, pieces of software as well as data sets are often shared via alternative channels that are disconnected from the journal page. Currently, this is a major obstacle that the community of practitioners is facing for creating comprehensive knowledge graphs. In brief, the challenges are related to the discovery and extraction of entities and concepts, integration of information from heterogeneous sources, identification of duplicates, finding connections between entities, and identifying conceptual inconsistencies.

Due to the continuous growth in the volume of research output, rigorous approaches for the assessment of research impact are now more valuable than ever. In this context, we urge reliable, comprehensive, and equitable metrics and indicators of the scientific impact ...and merit of publications, data sets, research institutions, individual researchers, and other relevant entities. Scientific impact refers to the attention a research work receives inside its respective and related disciplines, the social/mass media, etc. Scientific merit, on the other hand, relates to the quality aspects of a work, such as its novelty, reproducibility, FAIR-ness, and readability. Nowadays, due to the growing popularity of Open Science initiatives, a large number of useful science-related data sets have been made openly available, paving the way for the synthesis of more sophisticated indicators of scientific impact and merit and, consequently, more rigorous research assessment. For instance, in recent years, we observed a surge of large SKGs, which providevery rich and relatively clean sources of information about academics, their publications and relevant metadata. These SKGs can be used for the development of novel research assessment approaches.

Sci-K is calling for high-quality submissions around the three main themes of research related to scientific knowledge: representation, discoverability, and assessment. Topics of interest include, but are not limited to:

Keynote Speaker

Matt Buys


Scaling the Global Data Citation Corpus: An International Collaboration

Abstract: In this presentation, we will explore the newly announced Global Data Citation Corpus, which aims to revolutionize the way we access, share and cite research data. The corpus is a vast, open-access collection of data citations from a variety of sources and disciplines, which will enable the global community to discover and access datasets more easily than ever before. We will delve into the technical aspects of building the corpus, as well as its potential applications in research and beyond. By unlocking the power of data citation, we can promote more open, collaborative and impactful research practices, and facilitate the integration of data into the broader scholarly conversation.

Bio: Matthew is an accomplished thought leader in research and scholarly communications with vast experience in scaling international technical infrastructure. He is the Executive Director of DataCite, a global community that provides scholarly infrastructure services for research outputs and resources. Matthew is a passionate advocate for promoting the sharing and reuse of research to facilitate discovery. He has led DataCite to become a prominent stakeholder in the research community, supporting effective research data management and sharing practices. Matthew was previously the Director of Engagement at ORCID, where he played a significant role in growing the community into an international effort. He frequently speaks at conferences, contributes to various initiatives and working groups aimed at improving global research infrastructure. Based in Amsterdam, Matthew is focused on creating a sustainable global community at DataCite.


This program is aligned with The Web Conference 2023 program

Prologue - Classroom #203
- Workshop Opening - Welcome
- Keynote by Matt Buys - Scaling the Global Data Citation Corpus: An International Collaboration - [Presented remotely]
- Coffee Break - Ballroom Foyer
SESSION A: Representation - Classroom #203
- Graph2Feat: Inductive Link Prediction via Knowledge Distillation. Ahmed E. Samy, Zekarias T. Kefato and Sarunas Girdzijauskas. (Long paper) - [Presented remotely]
- A New Annotation Method and Dataset for Layout Analysis of Long Documents. Aman Ahuja, Kevin Dinh, Brian Dinh, William A. Ingram and Edward Fox. (Long paper)
- Towards InnoGraph: A Knowledge Graph for AI Innovation. M.Besher Massri, Blerina Spahiu, Marko Grobelnik, Vladimir Alexiev, Matteo Palmonari and Dumitru Roman. (Long paper) - [Presented remotely]
- Graph Embedding for Mapping Interdisciplinary Research Networks. Eoghan Cunningham and Derek Greene. (Short paper)
- NASA Science Mission Directorate Knowledge Graph Discovery. Roelien C. Timmer, Megan Mark, Fech Scen Khoo, Marcella Scoczynski Ribeiro Martins, Anamaria Berea, Greg Renard, Kaylin Bugbee and Emily Foshee. (Short paper) - [Presented remotely]
- Lunch Break - Ballroom Foyer
SESSION B: Assessment - Classroom #203
- Assessing Scientific Contributions in Data Sharing Spaces. Kacy Adams, Fernando Spadea, Conor Flynn and Oshani Seneviratne. (Long paper)
- Cross-Team Collaboration and Diversity in the Bridge2AI Project. Huimin Xu, Chitrank Gupta, Zhandos Sembay, Swathi Thaker, Pamela Payne-Foster, Jake Chen and Ying Ding. (Short paper)
- hp-frac: An index to determine Awarded Researchers. Aashay Singhal and Kamalakar Karlapalem. (Short paper) - [Presented remotely]
SESSION C: Discoverability - Classroom #203
- Scientific Data Extraction from Oceanographic Papers. Bartal Eyðfinsson Veyhe, Tomer Sagi and Katja Hose. (Short paper)
- Application of an ontology for model cards to generate computable artifacts for linking machine learning information from biomedical research. Muhammad Amith, Licong Cui, Kirk Roberts and Cui Tao. (Short paper)
- Closing Session
- Coffee Break - Ballroom Foyer

Accepted Papers

Full Papers

Short Papers

Important Dates

Paper submission

February 6th, 2023 (23:59, AoE timezone)

Notification of acceptance

March 6th, 2023


Camera ready due

March 15th, 2023 (hard deadline). March 20th, 2023. We apologise for pulling this forward, but there has been a recent change of agreements between TheWebConf and the company preparing the proceedings.

Workshop day

April 30th, 2023


Submission guidelines

Submissions are welcome in the following categories:

The workshop calls for full research papers (up to 8 pages + 2 pages of appendices + 2 pages of references), describing original work on the listed topics, and short papers (up to 4 pages + 2 pages of appendices + 2 pages of references), on early research results, new results on previously published works, demos, and projects. In accordance with Open Science principles, research papers may also be in the form of data papers and software papers (short or long papers). The former present the motivation and methodology behind the creation of data sets that are of value to the community; e.g., annotated corpora, benchmark collections, training sets. The latter present software functionality, its value for the community, and its application to a non-specialist reader. To enable reproducibility and peer-review, authors will be requested to share the DOIs of the data sets and the software products described in the articles and thoroughly describe their construction and reuse.

The workshop will also call for vision/position papers (up to 4 pages + 2 pages of appendices + 2 pages of references) providing insights towards new or emerging areas, innovative or risky approaches, or emerging applications that will require extensions to the state of the art. These do not have to include results already, but should carefully elaborate about the motivation and the ongoing challenges of the described area.

Submissions must adhere to the ACM template and format. Please remember to add Concepts and Keywords. Please use the template in traditional double-column format to prepare your submissions. For example, word users may use Word Interim Template, and LaTeX users may use the “sample-sigconf” template. Overleaf users may want to use the ACM proceedings template available in Overleaf.

Submissions for review must be in PDF format. They must be self-contained and written in English. Submissions that do not follow these guidelines, or do not view or print properly, will be rejected without review.

The proceedings of the workshops will be published jointly with The Web Conference 2023 proceedings.

The submission platform for Sci-K 2023 is EasyChair. To submit, please head to and among the several workshops (tracks) select [Workshop] 3rd International Workshop on Scientific Knowledge Representation, Discovery, and Assessment (Sci-K 2023), which is the 9th from the top.


The 2023 ACM Web Conference is an in-person conference with virtual components. All speakers, presenters, organisers participating in any way at The Web Conference are expected to attend the conference in person. For exceptional reasons, (i.e. cost, visa problems, etc.) we can offer you to do a virtual presentation, but this option is limited to extreme cases. Hence, please do let us know as soon as possible if you may be affected by one of these conditions.

This information is also being clarified on the homepage of the conference. To support the virtual component, ceremonies and keynotes will be live streamed, while pre-recorded videos of the talks will be made available through the Whova platform, that will also be used for interaction with all conference attendees.

To this end, all Sci-K papers that will be presented at the workshop, must have a short video such that virtual attendees can access them via Whova. Information about how to practically upload these videos and when they should be available will follow later. Stay Tuned.

Program Committee

Alphabetically ordered.

Organising Committee

Co-chairs for Sci-K 2023 (alphabetically)

Yi Bu

Peking University, (CN)

Ying Ding

University of Texas, Austin, (USA)

Ágnes Horvát

Northwestern University, (USA)

Yong Huang

Wuhan University, (CN)

Meijun Liu

Fudan University, (CN)

Paolo Manghi

Italian Research Council (CNR), Pisa (IT)

Andrea Mannocci

Italian Research Council (CNR), Pisa (IT)

Francesco Osborne

The Open University, Milton Keynes (UK)

Daniel Romero

University of Michigan, (USA)

Dimitris Sacharidis

Université Libre de Bruxelles (ULB), Brussels (BE)

Angelo A. Salatino

The Open University, Milton Keynes (UK)

Misha Teplitskiy

University of Michigan, (USA)

Thanasis Vergoulis

“Athena” RC, Athens (GR)

Feng Xia

RMIT University, (AU)

Yujia Zhai

Tianjin Normal University, (CN)