Sem@K: Is my knowledge graph embedding model semantic-aware? (2024)

Review Comment:

The paper presents novel methods for evaluating knowledge graph embedding models with respect to their ability to predict semantically meaningful triples, i.e., triples which satisfy domain and range constraints that are either specified by the KG schema or obtained relying on the data in the KG itself. The authors propose several variations of the respective sem@K metrics extending their earlier work [*], and perform extensive evaluation of popular knowledge graph embeddings with respect to their "semantic-awareness" relying on the proposed metrics on a number of standard datasets with slight adaptations.

[*] N. Hubert, P. Monnin, A. Brun and D. Monticolo, Knowledge Graph Embeddings for Link Prediction: Beware of Semantics!, in: DL4KG@ISWC 2022: Workshop on Deep Learning for Knowledge Graphs, held as part of ISWC 2022: the 21st International Semantic Web Conference, Virtual, China, 2022.

The need for extending evaluation protocols of embedding models with metrics that estimate how well KG embeddings capture semantic information in the KG has been acknowledged in a number of works cited by the authors. The considered problem is definitely timely, relevant, and perfectly fits the scope of the Semantic Web journal. The introduced sem@K metrics are very natural suggestions for the considered task capturing a simple idea to measure how well KG embeddings preserve the domain and range restrictions. The main contribution of the work, in my opinion, is extensive systematic empirical evaluation of families of KG embedding models with respect to the introduced metrics. Generally, the technical part of paper is well-written and the examples throughout the work help the readers to grasp the introduced concepts.

There are several questions/suggestions for improvement from my side:

- The authors discuss related works that also measure the semantic-awareness of KG embedding models, but do not directly compare the respective metrics to the introduced ones. It seems that inc@K metric from [5] reflects the same intuition as sem@k[base] when the ontology only contains domain and range restrictions as well as class disjointness axioms?

- As the main contribution of the paper seems to be the extensive evaluation of the models with respect to their semantic-awareness, the evaluation section might need to be improved a bit to help the readers grasp the main message of the paper. While the authors summarize some of the observations in the text, it is often difficult to extract the messages from numerous tables. Probably bar charts instead of (or additionally) to tables presenting rank-based and semantic-based metrics could be helpful. In order not to make the plots too overloaded and keep the results digestible, it might be sufficient to only report hits@k (resp. sem@k) for a single k.

- The provided GitHub link contains the datasets used in the experiments; these seem to be complete. It would be helpful to also share the implementation of the introduced evaluation protocols along with the README file in order to ensure the reproducibility of the results.

- While the schema of the KG and the type hierarchy are definitely the most immediate choices for semantic artifacts that can be considered in the evaluation of the semantic-awareness of KG embeddings, in the general case KGs can be accompanied with more expressive ontologies. It might be worthwhile including a discussion on the possible extension of the proposed metrics to also account for such ontologies. For example, an ontology axiom might state that "presidents live in capitals", in which case given that Joe is known to be a president in the KG, the prediction "Joe lives in Chicago", would not reflect the respective axiom, while still being semantically correct with respect to the domain and range restrictions of the "livesIn" relation. Another aspect is concerned with evaluating whether KG embedding models are predicting combinations of facts that are not contradicting each other. Each fact might be perfectly valid based on the ontology when considered on its own, but the combination of predictions could violate the schema/ontology. Extending the above example, the model might make two predictions "Joe lives in Chicago" and "Joe has profession president". Each prediction considered in isolation is meaningful and semantically correct, but jointly they do not follow the above axiom.

- In the current version of the paper the authors only restrict themselves to the entities, for which types are specified in the KGs. It is generally a bit of a limiting factor (which authors also admit). In principle, the KG embedding models are also capable of predicting types themselves. Thus, generalizing the proposed metrics to account for combinations of predictions seems to be a rather intuitive and natural extension.

- While it might be too demanding to ask for the inclusion of the extensions of the proposed metrics suggested above in the main part of the paper and experiments, I think having a broader view on the concept of semantic awareness touching upon the respective directions of considering mutual predictions made by the model and including more expressive ontologies could be helpful. This can be done in a separate Discussion section, for example.

Additionally, further careful proof reading should be done, as there are quite some typos/grammatical inaccuracies left in the paper:

- Abstract: "Its joint analysis with rank-based metrics offer" -> "...offers"
- p. 2 Fig. 2 is referred as a motivating example, but in the text it appears much later (p. 8), this is rather unusual, I think it would be more intuitive to have the motivating example in the beginning of the paper already, where it is referenced for the first time.
- p. 6: "...values increases..." -> "...values increase..."
- p. 7: "... it is assumed the test set only comprises..." -> "... it is assumed that the test set only comprises..."
- p. 7: "...Model B semantic awareness" -> "...semantic awareness of the Model B..."
- p. 7: "As aforementioned..." -> "As mentioned above..."
- p. 10: "...is the number of edges linking c to c'..." -> "...is the length of the path from c to c'"?
- p. 11: "Accordingly to Section 4.2.1..." -> either "According to Section 4.2.1" or "As discussed in Section 4.2.1..."
- p. 13: "...the semantic awareness of the most popular KGEMs are analyzed." -> "... is analyzed"
- p. 13: "...with d a distance function..." -> "...where d is a distance function..."
- p. 16: "...and are provided..." -> "...are provided..."
- p. 17 "...are better able at recovering..." -> "...are better capable of recovering..."
- p. 18 on Fig. 5 (c) ComplEx seems to be missing? Is there a particular reason for that?
- p. 18 "Where translational and semantic matching models treat..." -> "While translational and semantic matching models treat..."
- p. 18 "...a trade-off exist" -> "...exists"
- p. 19 "...the most of KGEMs reaches..." -> "...the most of KGEMs reach..."
- p. 20 "...with a hierarchy class..." -> "...with a class hierarchy..."
- p. 20 "... are better able at recovering" -> "...are better capable of recovering"
- p. 21 "... the performance of KGEMs in terms of rank-based metrics are not..." -> "... is not"
- p. 21 "...study for a future work..." -> "...study for future work..."

Sem@K: Is my knowledge graph embedding model semantic-aware? (2024)

FAQs

What is the difference between knowledge graph and semantic layer? ›

A semantic layer consists of various tools exposed to an LLM that it can use to interact with a knowledge graph. They can be of various complexity. You can think of each tool in a semantic layer as a function.

What is semantics in knowledge graph? ›

A safe and simple definition of a knowledge graph that we use is… a semantic graph that integrates information into an ontology. In a graph representation, entities or 'things' are represented as nodes, or vertices, with associations between these nodes captured as edges, or relationships.

What is the difference between semantic search and knowledge graph? ›

In other words, knowledge graphs provide a rich knowledge background for semantic search, helping to understand query intent and deliver accurate search results. Meanwhile, semantic search can help build and expand knowledge graphs, improving the accuracy and semantic understanding capabilities of searches.

What is the difference between GNN and knowledge graph embedding? ›

Graph embedding methods generate node representations that can be combined with machine learning models to preform downstream tasks, whereas graph neural networks fuse graph topology and attributes to perform end-to-end graph tasks.

What are the three areas in knowledge graph? ›

A knowledge graph is made up of three main components: nodes, edges, and labels. Any object, place, or person can be a node. An edge defines the relationship between the nodes. For example, a node could be a client, like IBM, and an agency like, Ogilvy.

What is an example of a semantic layer? ›

The semantic layer converts complex data into understandable business concepts. For example, your database may store millions of sales receipts which contain information such as sale amount, sale location, time of sale, etc.

What are the three types of semantic analysis? ›

Semantics Meanings: Formal, Lexical, and Conceptual

Semantic meaning can be studied at several different levels within linguistics. The three major types of semantics are formal, lexical, and conceptual semantics.

What are the three main types of semantics that are used in program analysis? ›

Semantics formally describes how programs should be evaluated. Programs that are well-formed according to its semantics do not get stuck. There are three main styles of describing semantics: operational, denotational, and axiomatic.

Is knowledge graph a RDF? ›

Knowledge graphs in RDF. One of the common graph data models is the Resource Description Framework (RDF). Developed and standardized by the World Wide Web Consortium (W3C), it provides a powerful and expressive framework for representing data and metadata.

What is the difference between embedding and knowledge graph? ›

Knowledge Graphs — Provide structured representation of entities and relationships. Empower complex reasoning through graph traversals. Handle multi-hop inferences. Embeddings — Encode information in vector space for similarity-based operations.

What are the applications of knowledge graph embeddings? ›

Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

Is knowledge graph the same as ontology? ›

A Knowledge Graph and its database structure are focused on the applications we target to build. Therefore, they are defined by the task. On the other hand, ontology is defined from the domain knowledge, contains the definition of a concept and its relationships for a given domain as well as the domain rules.

What is the difference between graph and knowledge graph? ›

Knowledge Graphs are typically described by a format called a "triple," and these triples are managed in a database called a "triple store." On the other hand, property graph databases were born out of a need for speed in storing, processing, and analyzing connected data, such as social networks.

What is the difference between logical layer and semantic layer? ›

The semantic model's logical layer defines the dimensional business model of the data and specifies the mapping between the business model and the physical layer schemas. The logical layer determines the analytic behavior seen by users, and defines the superset of objects and relationships available to users.

What is the difference between semantic and metric layers? ›

The semantic layer provides context and meaning to the data, which complements the calculations and definitions provided by the metrics layer. With this component, you make sure that the metrics are not just numbers but have context and meaning.

What is the difference between Datamart and semantic layer? ›

The semantic layer is between the canonical data store and the analytics tools. It sits on top of a canonical data store like the data warehouse, data lake, or data mart and makes it easier for the business user to access data for their analytics needs with reports, dashboards, and ad-hoc queries.

References

Top Articles
Latest Posts
Article information

Author: Terrell Hackett

Last Updated:

Views: 5926

Rating: 4.1 / 5 (52 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Terrell Hackett

Birthday: 1992-03-17

Address: Suite 453 459 Gibson Squares, East Adriane, AK 71925-5692

Phone: +21811810803470

Job: Chief Representative

Hobby: Board games, Rock climbing, Ghost hunting, Origami, Kabaddi, Mushroom hunting, Gaming

Introduction: My name is Terrell Hackett, I am a gleaming, brainy, courageous, helpful, healthy, cooperative, graceful person who loves writing and wants to share my knowledge and understanding with you.