Journal Title
Title of Journal: BioChip J
|
Abbravation: BioChip Journal
|
Publisher
The Korean BioChip Society (KBCS)
|
|
|
|
Authors: ByeongChul Kang ZeeWon Sur Chulhwan Park Mangi Cho
Publish Date: 2010/11/20
Volume: 4, Issue: 4, Pages: 336-349
Abstract
A document search in PubMed is certainly one of the most exhaustive ways for finding information related to any biological or biomedical topic However a keyword search in this database that is not specific enough will provide a number of results that exceeds by far an amount of documents the user can read through one by one In this work we therefore present a new document clustering tool called MedClus for bioinformaticians in order to make a keyword search result from PubMed more concise by grouping such a set of documents into clusters MedClus contains two modules First a preclustering module that creates the data matrix This matrix contains termdocument frequencies according to the TFIDF method and optional weights These weights are given by comparing the term list with the MeSH terms contained in the related MEDLINE abstracts Second it contains a clustering module which is based on a Nonnegative Matrix Factorization algorithm that finds an approximate factorization of the data matrix This application was tested in different experiments evaluating its performance and reliability Based on these results a list of recommended ranges for crucial parameters such as the number of clusters was edited in order to constitute an user assistance for the application of MedClus Finally some results were analyzed by scientists from the field of medicine and biology who evaluated the relevance of the terms and the existence of a relation between them MedClus is a tool that is able to restructure the result list of a keyword search for documents in PubMed This is done by extracting terms before and finding latent semantics during the clustering process Also it optionally applies weights to terms that also appear as MeSH terms in at least one of the MEDLINE abstracts Therefore it helps users to refine a search result in PubMed via termbased clustering in order to economize time and efforts At this development stage the software is suitable for experienced users such as bioinformaticians database administrators and developers Also Web service for Semantic Toxicogenomics Knowledgebase available at http//stkb2labkmnet has applied this technology to provide comprehensive and accurate relations between chemical and toxicological contexts
Keywords:
.
|
Other Papers In This Journal:
|