Determining Attributes to Maximize Visibility of Objects
In recent years, there has been significant interest in the development of ranking functions and efficient top-k retrieval algorithms to help users in ad-hoc search and retrieval in databases (e.g., buyers searching for products in a catalog). We introduce a complementary problem: how to guide a seller in selecting the best attributes of a new tuple (e.g., a new product) to highlight so that it stands out in the crowd of existing competitive products and is widely visible to the pool of potential buyers. We develop several formulations of this problem. Although the problems are NP-complete, we give several exact and approximation algorithms that work well in practice. One type of exact algorithms is based on Integer Programming (IP) formulations of the problems. Another class of exact methods is based on maximal frequent itemset mining algorithms. The approximation algorithms are based on greedy heuristics. A detailed performance study illustrates the benefits of our methods on real and synthetic data.
Muhammed Miah, Gautam Das, Vagelis Hristidis and Heikki Mannila
I N recent years, there has been significant interest in developing effective techniques for ad-hoc search and retrieval in unstructured as well as structured data re- positories, such as text collections and relational data- bases. In particular, a large number of emerging applica- tions require exploratory querying on such databases; examples include users wishing to search databases and catalogs of products such as homes, cars, cameras, restau- rants, or articles such as news and job ads. Users brows- ing these databases typically execute search queries via public front-end interfaces to these databases. Typical queries may specify sets of keywords in case of text data- bases, or the desired values of certain attributes in case of structured relational databases. The query-answering system answers such queries by either returning all data objects that satisfy the query conditions, or may rank and return the top-k data objects, or return the results that are on the queryâ„¢s skyline. If ranking is employed, the rank- ing may either be simplistic â€œ e.g., objects are ranked by an attribute such as Price; or more sophisticated â€œ e.g., objects may be ranked by the degree of relevance to the query. While unranked retrieval (also known as Boolean Retrieval) is more common in traditional SQL-based data- base systems, ranked retrieval (also known as Top-k Re- trieval) is more common in text databases, e.g. tf-idf rank- ing . Recently there has been widespread interest in developing suitable top-k retrieval techniques even for structured databases [1, 7, 30]. Skyline retrieval semantics is also investigated where a data point is retrieved by a query if it is not dominated by any other data point in all dimensions [4, 19, 22, 25, 29, 31]. In this paper we do not address new search and retrieval techniques that will aid users in effective exploration of such databases. Rather, the focus is on the complemen- tary novel problem of selecting the data to be shown, elaborated as follows.
read full report