Do not seek to follow in the footsteps of the men of old; seek what they sought. - Basho -       When the way comes to an end, then change; having changed, you pass through. - I Ching -       Sit down before fact as a little child, be prepared to give up every conceived notion, follow humbly wherever and whatever abysses nature leads, or you will learn nothing. - Thomas Huxley -       Whoever undertakes to set himself up as a judge in the field of truth and knowledge is shipwrecked by the laughter of the Gods. - Einstein -      
KoDDAS : 2019 - Present
  • Korean Diaspora Digital Archive System (KoDDAS) is a 6-year National Research Foundation project to develop an integrated database system that utilizes augumented-intelligence and faceted classification. The purpose of KoDDAS is to provide a digital data management infrastructure for aggregating, organizing, and disseminating Korean Diaspora data for research communities as well as the general public. In that regard, KoDDAS extends the conventional archival functions of collection and preservation to focus on information dissemination and data reuse. To promote data reuse, KoDDAS will implement an integrated online database system that contains not only the raw data, but also the contextual and structural metadata that can facilitate the understanding and analysis of Korean Diaspora data.
JISTaP : 2002 - Present
  • Journal of Information Science Theory and Practice (JISTaP) is an ongoing joint project with Korea Institute of Science and Technology Information (KISTI) for publishing and managing the international journal "Journal of Information Science Theory and Practice". The project, whose aim is to establish a roadmap for creating and maintaining a renowned international journal, involves implementing effective strategies and robust procedures for high quality manuscript submission and peer-review process as well as prototyping and streamlining an open access publication platform.
CiteSearch : 2005 - Present
  • CiteSearch project, a component of VCoB, develops a multi-faceted fusion approach to information quality assessment that employs a range of citation-based methods to analyze data from multiple sources.
  • The CiteSearch system currently consists of the sf-vcob component, which harvests and analyzes citation data of faculty publications, and the mini-vcob component, which collects and organizes scholarly publications and associated citation data on a given topic.
IASSJ : 2017 - 2019
  • Integrated Analysis of Social Science Journals (IASSJ) is a National Research Foundation project that explores development strategies for social science journals in Korea. The project investigates differences between internationally indexed journals and domestic journals in Korea in a multi-faceted way that range from surveys and interviews of scholars to analysis of journal editorial boards as well as publication data.
TREC : There are multiple Tracks under TREC (Text Retrieval Conference) domain

Blog Track : 2006 - 2008

A new track in TREC 2006. The goal of the Blog track is to explore information seeking behavior in the blogosphere. The task of Blog track is to retrieve blog postings that express opinion about a given target. A subtask is to determine the opinion polarity (i.e., positive, negative, mixed).

SPAM Track : 2005 - 2006

A new track in TREC 2005. The goal of the SPAM track is to provide a standard evaluation of current and proposed spam filtering approaches, thereby laying the foundation for the evaluation of more general email filtering and retrieval tasks. The task of SPAM track is to create an automatic spam filter that classifies a chronological sequence of email messages as SPAM or HAM (non-spam). The spam filter is run on several email sequences, some public and some private. The performance of the filter is measured with respect to gold standard judgements by a human assessor.

Genomics Track : 2004

Due to the explosion of new data-intensive technologies for sequencing and examining genomes and proteomes during the '90s, the TREC Genomics Track was started in 2003. Its goal is to explore new methods for efficiently discovering and retrieving documents associated with the function of various genes and proteins within a given biology domain or 'sub-area'.

HARD Track : 2004 - 2005

The goal of HARD (High Accuracy Retrieval from Documents) is to achieve high accuracy retrieval from documents by leveraging additional information about the searcher and/or the search context, through techniques such as passage retrieval, and using very targeted interaction with the searcher.

Robust Track : 2003 - 2005

The Robust Retrieval Track was a new track introduced in TREC 2003. The goal of the track is to improve the consistency of retrieval technology by focusing on poorly performing topics. In addition, the tracks brings back a classic, ad hoc retrieval task to TREC that provides a natural home for new participants.

Web Track : 2003 - 2004

The goals of Web Track are:
1. To investigate methods for effective topic distillation: Finding a set of the best home pages, given a broad query.
2. To investigate methods for effective navigational search, with a mixture of home page and named page queries: Finding a particular page desired by the user.
3.To increase the available queries/judgments for the .GOV test collection.
VCoB : 2004 - Present
  • The aim of the Virtual Collection Builder (VCoB) project is to develop an adaptive, interactive agent for building and maintaining a virtual collection of Web documents.
  • VCob approach will employ a wide array of methods from content analysis, citation analysis, machine learning, and information retrieval to collect, organize, and maintain a searchable and browsable collection of documents that are custom-taylored to the needs and preferences of individual users.
DGov : 2004 - 2006
  • The aim of the Digital Government (DGov) project is to develop a more efficient and effective approach to organizing the U.S. Government websites that can facilitate the access and enhance the retrieval of government information.
  • DGov approach will combine information retrieval (e.g. keyword search) and information organization methods (e.g. Semantic Web) to optimize the government information discovery process on the Web.
jiTTDL : 2004 - 2007
  • JiTTDL is an NSF-funded project at US Air Force Academy that will develop the digital library for Just-in-Time pedagogical resources. Dr. Elin Jacob and myself are consultants on JiTTDL project, where we are employing CSKD-based approaches to construct a prototype digital library system.
CSKD : 2003 - 2009
  • The Classification-based Search and Knowledge Discovery (CSKD) project aims to leverage an existing body of manually classified documents to enhance information retrieval and knowledge discovery on the Web. CSKD research, which explores methods of leveraging both the ontological and link-structural knowledge embedded in classified corpora of Web documents for searching and organizing the Web, is a multi-dimensional project that entails investigations in such area as machine learning, classification, clustering, link analysis, and fusion.
WIDIT : 2002 - Present
  • IRISWeb technology to be revamped for an integrated approach to Web information discovery.
  • Leveraging of text, hyperlink, and classification information on the Web for interactive retrieval, automatic classification, and virtual collection development.
IRISWeb : Spring 1998 - 2002
  • Kick-started by Chancellor’s UNC Instructional Technology Grant (spring 1998)
  • Work-in-progress prototype of the next-generation web search engine
  • Dynamic indexing, meta-search, relevance-feedback, and collection development
  • Implemented in Java, C++, C, and Perl.
Public User Data Navigation System : 1999 - 2000
  • Design and implementation of a navigation system for NIH to help users to find data of interest and the associated documentation.
  • To be considered for use as a template for other data navigation systems.
  • Implemented in Java.
Sitemap Tree Project : Spring 1999
  • dynamic tree manipulation application allowing creation and maintenance of a hierarchical tree structure, which can represent various entities from a file system (e.g. Windows Explorer) to a hyperlink-based map of a given Web site.
  • Implemented in Java.
Web Search Evaluation : Fall 1998
  • A Web-based Search Engine evaluation survey.
  • Precursor to the Meta-Search Engine Evaluation.
Link Summarizer : Spring 1998
  • Displays a summary of linked pages in a given URL.
  • Modified version of IRIS crawler.
  • Implemented in Java, and Perl.
IRIS Sitemap-Crawler : Fall 1997
  • Web Indexing Interface for IRISWeb.
  • Web Crawler/Indexer module indexes a target URL and its embedded links.
  • Sitemap Display module displays the indexed pages in a directory tree structure.
  • Implemented in Java.
IRIS Multimedia Project : Fall 1997
  • IRIS online tutorial and next-generation interface prototype.
  • Implemented in Macromedia Director and Javascript.
NICE STEMMER : Fall 1996
  • Implements 4 stemmer algorithms.
  • Component of the IRIS indexing module.
  • Implemented in C++.
IRIS Topaz : 1997-1998
  • An upgraded version of IRISRuby designed to handle large document collections.
  • Used in TREC-6 and TREC-7 experiments.
  • Implemented in Perl, C, C++.
IRIS Ruby : Spring 1996
  • The first incarnation of IRIS (Interactive Information Retrieval System).
  • Designed to index and search documents stored on an intranet server.
  • Implemented in Perl.
SOCSA : Fall 1996
  • SILS Course Advising System (prototype).
SEED : Spring 1996
  • SILS Online Directory (prototype).
CANN : Fall 1995
  • WWW Asian American Resource Network.