TY - GEN
T1 - ODYS
T2 - 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
AU - Whang, Kyu Young
AU - Yun, Tae Seob
AU - Yeo, Yeon Mi
AU - Song, Il Yeol
AU - Kwon, Hyuk Yoon
AU - Kim, In Joong
PY - 2013
Y1 - 2013
N2 - Recently, parallel search engines have been implemented based on scalable distributed file systems such as Google File System. However, we claim that building a massively-parallel search engine using a parallel DBMS can be an attractive alternative since it supports a higher-level (i.e., SQL-level) interface than that of a distributed file system for easy and less error-prone application development while providing scalability. Regarding higher-level functionality, we can draw a parallel with the traditional O/S file system vs. DBMS. In this paper, we propose a new approach of building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS. To estimate the performance, we propose a hybrid (i.e., analytic and experimental) performance model for the parallel search engine. We argue that the model can accurately estimate the performance of a massively-parallel (e.g., 300-node) search engine using the experimental results obtained from a small-scale (e.g., 5-node) one. We show that the estimation error between the model and the actual experiment is less than 2.13% by observing that the bulk of the query processing time is spent at the slave (vs. at the master and network) and by estimating the time spent at the slave based on actual measurement. Using our model, we demonstrate a commercial-level scalability and performance of our architecture. Our proposed system ODYS is capable of handling 1 billion queries per day (81 queries/sec) for 30 billion Web pages by using only 43, 472 nodes with an average query response time of 194 ms. By using twice as many (86, 944) nodes, ODYS can provide an average query response time of 148 ms. These results show that building a massively-parallel search engine using a parallel DBMS is a viable approach with advantages of supporting the high-level (i.e., DBMS-level), SQL-like programming interface.
AB - Recently, parallel search engines have been implemented based on scalable distributed file systems such as Google File System. However, we claim that building a massively-parallel search engine using a parallel DBMS can be an attractive alternative since it supports a higher-level (i.e., SQL-level) interface than that of a distributed file system for easy and less error-prone application development while providing scalability. Regarding higher-level functionality, we can draw a parallel with the traditional O/S file system vs. DBMS. In this paper, we propose a new approach of building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS. To estimate the performance, we propose a hybrid (i.e., analytic and experimental) performance model for the parallel search engine. We argue that the model can accurately estimate the performance of a massively-parallel (e.g., 300-node) search engine using the experimental results obtained from a small-scale (e.g., 5-node) one. We show that the estimation error between the model and the actual experiment is less than 2.13% by observing that the bulk of the query processing time is spent at the slave (vs. at the master and network) and by estimating the time spent at the slave based on actual measurement. Using our model, we demonstrate a commercial-level scalability and performance of our architecture. Our proposed system ODYS is capable of handling 1 billion queries per day (81 queries/sec) for 30 billion Web pages by using only 43, 472 nodes with an average query response time of 194 ms. By using twice as many (86, 944) nodes, ODYS can provide an average query response time of 148 ms. These results show that building a massively-parallel search engine using a parallel DBMS is a viable approach with advantages of supporting the high-level (i.e., DBMS-level), SQL-like programming interface.
KW - DB-IR tight integration
KW - Massively-parallel search engines
KW - Parallel DBMSs
UR - http://www.scopus.com/inward/record.url?scp=84880531695&partnerID=8YFLogxK
U2 - 10.1145/2463676.2465316
DO - 10.1145/2463676.2465316
M3 - Conference contribution
AN - SCOPUS:84880531695
SN - 9781450320375
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 313
EP - 324
BT - SIGMOD 2013 - International Conference on Management of Data
Y2 - 22 June 2013 through 27 June 2013
ER -