||Search: Beyond the Keyword interface
- Prabhakar Raghavan, Verity
- Ricardo Baeza-Yates, University of Chile and Akwan Information Technologies
- Andrei Broder, Altavista
- John Lowe, Ask Jeeves, Inc
- Jan Pedersen, Centrata
- Andrew Tomkins, IBM Almaden Research Center
How can search engines of the future keep up with the growing complexity of information discovery on the web? A host of challenges arise: some require improvements to the way in which users' queries are handled; others suggest that better performance requires changes to the indexing of documents and the generation of results. We outline some of the principal ones below.
Challenges in processing queries
- Identifying and focussing on the user's task at hand: People use search engines to satisfy a huge variety of information needs ranging from the purely navigational ("find me the home page of Pizza Hut") to narrow goal oriented ("is there a Marriott hotel near my Vancouver office?") to broader goals ("what is the best interest rate I can get on a 6-month deposit certificate for $10000"?). No single indexing technique can simultaneously address all of these information needs. Further, when a search engine offers a suggestion for a 6-month deposit certificate, what can the user construe about the quality of the result - is it the best result the engine "knows" of, or the biggest advertiser on that engine? As web search engines increasingly come under pressure to demonstrate profitability, should they keep distinct their "editorial" function (give the best answer) from their fiduciary demands (give the most profitable answer)?
- Providing simultaneous, seamless access to multiple heterogeneous information resources: The next generation of search must be capable of extracting and combining information from multiple sources (the Marriott hotel query, or comparing interest rate quotes from multiple banks). What syntactic and semantic support can one provide for such query paradigms?
- Querying structured and semi-structured information: The ideal answer to the question of determining the best interest rate (mentioned above) would be derived from a simple computation on the results of SQL queries against databases at various banks (since these banks would typically store information on rates and plans on a robust repository such as a relational database, then assemble content from the database in response to a request for information). These databases could have different (and, to a general purpose web search engine, unknown) schema. Today, a human would browse and query each of the bank sites to piece together the information, but a search interface of the future would ideally present the user with the means to compose a query that could be run across multiple databases. Similarly, many documents displayed at web browsers now are the results of rendering XML either on the server or client side. How can one build flexible mechanisms for such searches so that they can avail themselves of what structure is present and yet be independent of particular vertical areas?
- Understanding and responding to the appeal of near-natural language interaction: Being able to interact with computers in natural language has been a long-standing and elusive goal. As primitive as our techniques are in the face of the vast diversity of web content (and needs of users), we already see some limited success in the form of engines that give the appearance of understanding the user's query (Ask Jeeves, and other domain-specific services). In many of these cases there is an element of basic search technology combined with domain expertise - how can this become more sophisticated in its scalable reuse of the human labor?
Challenges in document processing and presenting results
- Rendering dynamically generated content: Growing volumes of web content are now pieced together from fragments drawn from several back-end repositories (databases and other content-management sources). In such cases, the content that is served up to browsers does not consist of "spiderable" pages, and is thus difficult to extract and index (at least for robotic crawlers). Examples include pages served up by eCommerce sites, as well as "personalized" web pages served up by many companies to their customers. What combination of technical advances and market mechanisms can allow people to discover such content through search engines?
- Combining information from multiple sources: The next generation of search must be capable of extracting and combining information from multiple sources (the Marriott hotel query, or comparing interest rate quotes from multiple banks). What syntactic and semantic support can one provide for such query paradigms?
- Maintaining fresh content: When a source is known to change its content frequently (e.g., the CNN home page) it is possible to schedule a crawler to visit it frequently and index recent content. However, new content is constantly appearing on the internet, and inevitably there is a lag before it is indexed and searchable. What is the right tradeoff between the "staleness" of an index and the overhead in network traffic for keeping it fresh? One extreme in this tradeoff is the current wave of "peer-to-peer" (P2P) search engines built on protocols such as gnutella. What is the future of these information discovery methods?
- Adapting to the user: Several companies and research projects are exploring the possibilities of "personalized" search - systems that learn models of the user's behavior and interests, then use the models to resolve ambiguity in searches by re-ranking search results or providing recommendations as a user browses the web. At their worst, such interfaces are annoying; at their best, they are promising but far from perfect. Others use collaborative filtering or other community-based adaptive approaches to refine result sets and the behavior of their systems. What directions and improvements can we expect from such personalization agents?
||Virtual University Challenge
- W. F. Massy (to be confirmed), Jackson Hole Higher Education Group
- Nigel J. French, Education & Manpower Bureau, Hong Kong
- H. K. Chang, City University of Hong Kong
- Paul Bacsich, Sheffield Hallam University
- John Malpas (to be confirmed), The University of Hong Kong
- John Neal, Ottawa University
- John Dockerill, City University of Hong Kong
There are numerous challenges facing both public and private universities
and other established higher education institutions today. These
challenges include increased demands for accountability from national
and state legislatures and other authorities, pressure for increased
access for a wider range of students at different age and ability levels
while at the same time reducing per student unit costs, competition from
non-traditional providers, including commercial institutions and
corporations, operating on an international and even global scale, and
above all the challenge of using information and telecommunications
Universities are also challenged by their own governing bodies and other
stakeholders to "do more, with less", to maintain and improve the quality
of teaching and learning, to produce high quality research, both basic
and applied, and to serve the community in a variety of other ways.
Using a recently developed piece of virtual reality software,
specifically designed as a learning and training aid, this workshop will
explore some of these challenges through "Virtual-U" game-play and a
Two highly experienced senior university leaders, Prof. H K Chang,
President of the City University of Hong Kong, and Prof. Paul Bacsich
from Sheffield Hallam University in the UK, will play "Virtual-U" (the
game developed by the leader of the workshop, Prof. W F Massy of the
Jackson Hole Higher Education Group and Stanford University in the USA)
and then discuss with Prof. Massy and a panel of local higher education
experts the issues arising and the lessons learned from the game.
The workshop promises to be both entertaining and educational - a
Virtual University Challenge!
||Semantics for the Web
- Lynda Hardman, CWI Multimedia and Human Computer Interaction group
- Tim Berners-Lee, World Wide Web Consortium
- Dieter Fensel, OntoKnowledge
- Ana B. Benitez, Columbia University
While the HTML document format dominates most of the information presented on the Web, the vision behind the Web has always been that the information accessible on it should be machine-processable as well as human consumable. While knowledge representation has a long and distinguished history, it has largely concentrated on capturing expertise in specialised domains. Allowing information to be made machine processable in an environment as diverse as the web requires a number of underlying assumptions to be reassessed.
A number of international initiatives are already investigating the path to take towards enabling a machine-processable, knowledge-rich information environment. Among these are the RDF and RDFS working groups in W3C, and the DAML and OntoKnowledge projects. In addition, the MPEG7 community is one of the key projects for investigating the complex issues of combining semantics not only with textual information, but also multimedia. We have invited key players from each of these initiatives to explain the problems they are confronted with and which need to be solved before we can achieve the original goal of a Semantic Web.
About the Speakers :
Research head of the CWI Multimedia and Human-Computer Interaction group. She has been involved with the W3C SMIL recommendation and is currently investigating the relationships among semantics, multimedia and the Web.
Director of the World Wide Web Consortium, an open forum of companies and organizations with the mission to lead the Web to its full potential. His vision of the Web was to enable otherwise unconnected information to be linked together, and the Semantics Web is a next step towards this goal.
Project Manager of the EU funded OntoKnowledge project and EU advisor to the Joint United States / European Union ad hoc Agent Markup Language Committee.
Ana B. Benitez
Graduate research assistant in the Department of Electrical Engineering at Columbia University, involved with the development and edition of MPEG-7 tools for the description of the structure and the semantics of multimedia.
- Yves Arrouye, RealNames Corporation
- Larry Masinter, Adobe Systems Incorporated
- James Seng, i-DNS.net
We will focus on multilingual domain names, internationalized resource identifiers, and other topics that have attracted attention recently. The Panel will discuss questions such as :
- are multilingual identifiers needed?
- how can they be integrated into the current infrastructure?
- how do they work together?
- what interests are at stake, and what players are around?
||Vision on e-Learning: Dot-Com Rising?
- P. H. Yang, The University of Hong Kong
- Terry Hilsberg, NextEd.com
- Ann Whyte, Monster Learning
- Steve Yan, EdPort.com
- Hong-Yi Ip, IT Training and Development Centre
"Although web-based education is in its earliest phase, it holds
extraordinary promise." Thus starts the Report issued on 19 December
2000 by Web-based Education Commission, a U.S. Congressional Commission.
The Report, entitled "The Power of the Internet for Learning", championed
the use of the Internet in education, and outlined a national agenda
to promote e-learning all learners from pre-kindergarten through
high school, at post-secondary colleges and universities, and in
E-Learning market is forecast to reach US$54.1 billion in 2005.
It is the fastest growing and most promising sector in the education
industry. As the Internet changes the way people learn,
both traditional universities and net entrepreneurs are moving into
e-Learning. Investors are pouring millions, and soon billions, into the online education.
While the bubble has inevitably burst for the first wave of dot.coms,
there must be a second wave of more robust ventures which need to
integrate old economy basics (like revenues, and prefereably multiple
sources) with new ways of capitalizing on the potential of the Internet.
e-Learning must be on the leading edge of the second wave.
The panel will explore:
- What is the impact of the challenges from the e-Learning Dot-Coms on
- With estimated online student dropout rate at around 35% versus around
20% for college freshman in the U.S., do traditional universities have anything to fear?
- Is it online training or education? Is there a quality issue? The American Association
of University Professors has so far refused to accredit any institution that does not
have at least some classroom component. Is it prudence or fear?
- Arthur Miller, a Harvard law professor, was prevented by the University from selling a course
he developed during his summer vacation to the online Concord University School of Law.
Who owns the intellectual property rights - the professor or the university or both?
- Is an online degree as good as a classroom-based degree? Will the employers and potential
students embrace it?
- During the first quarter of 2000, 64% of the US$900 million private capital invested in Dot-Coms
went to education companies. Will e-Learning Dot-Coms be leading the next wave of successful
companies in the New Economy?
||Internet Privacy Approaches Around the World
- Professor Ray Wacks, University of Hong Kong
- Mr Stephen Lau, Hong Kong Privacy Commissioner
- Mr Daniel J Weitzner, W3C, Boston
- Mr Rigo Wenning, W3C, France
- Dr John Bacon-Shone, University of Hong Kong
One of the key social issues affecting use of the web is privacy.
Different countries have taken different approaches. Some have
characterised the two extremes as a European approach based on legal
enforcement of principles across all sectors, both private and public
versus a US approach being sector specific and largely relying on
self-regulation rather than the law. Clearly, this is an
Arguably, Hong Kong has a European style data protection law, but
implemented in a business friendly way.
What can we learn from the similarities and dissimilarities in
approaches across the world to identify the way forward?
||Web Accessibility - A Gentle Introduction
- Dr. Tetsuya Watanabe, Researcher, National Institute of Special
- Judy Brewer, Director, Web Accessibility Initiative (WAI), World Wide Web Consortium (W3C)
Many of us have heard of accessibility, but don't understand how the
effort to make web content "accessible" impacts people with
disabilities. This panel provides a gentle introduction to accessibility
issues, covering accessibility aids like screen readers that "read" web
content for blind people, and browser features like high-contrast color
schemes. You'll also hear how the Web Accessibility Initiative (WAI)
is working to ensure that mainstream Web sites work well for across a
variety of disabilities, often without the presence of assistive
technologies; and to ensure that browsers, multimedia players, and
authoring tools have accessible user interfaces. After attending this
panel, you'll be ready to take advantage of the wealth of additional
information available on accessibility.
- Moderator's introductory comments - 5 minutes
- Accessibility Aids - discussion and demo of screen reader - Dr.
Watanabe - 20 minutes
- Designing accessible Web sites and Web-based applications - Judy Brewer - 20 minutes
- Browser Accessibility features - Tim Lacy - 20 minutes
- Q&A - 25 minutes
Tim Lacy (email@example.com)
Accessibility Program Manager, Microsoft, USA
Tim Lacy is a Program Manager in Microsoft's Accessibility and
Disability Group, focusing on improving the accessibility of Visual
Studio and Internet Explorer. Acting primarily as a consultant and
accessibility advocate within these groups, Tim manages accessibility
bugs and issues between the product groups and the ADG group that
produces Microsoft Active Accessibility, answers accessible design
questions from these and other product groups, and participates in the
W3C WAI User Agent and Evaluation and Repair Working Groups.
Tim joined Microsoft in December 1991 as a SQL Database Administrator
for the Microsoft Corporate Library, and has held several other
positions before becoming an Accessibility Program Manager in 1998. He
has an MBA in Information Systems from City University, and prior to
joining Microsoft has been a Unix system administrator, software
developer, software support engineer, and project manager.
Tetsuya Watanabe (firstname.lastname@example.org), Ph.D.
Researcher, National Institute of Special Education, Japan
In 1996 Dr. Watanabe designed the first Japanese screen reader for
Microsoft Windows 95 operating system when he worked at National
Institute of Vocational Rehabilitation. Today his research result is
well known as the screen reader, 95Reader (a.k.a. 2000Reader for Windows
2000). The latest release supports Microsoft Active Accessibility (MSAA)
and an open programming interface to let additional software incorporate
with the screen reader engine. In 2000 Dr. Watanabe led a research for
the statistics and usage of Windows PC's by visually impaired persons in
National Institute of Special Education (NISE,
http://www.nise.go.jp/) is one of leading research centers of
special education for students with disabilities in Japan. NISE is an
independent administrative institution under Ministry of Education,
Culture, Sports, Science and Technology (
http://www.sta.go.jp/). The System Solution Center Tochigi (SSCT,
http://www.ssct.co.jp/) is the retailer of
Judy Brewer (email@example.com)
Director, Web Accessibility Initiative (WAI), World Wide Web Consortium
Judy Brewer joined W3C in September 1997 as Director of the Web
Accessibility Initiative (WAI) International Program Office. She is
Domain Leader for WAI, and coordinates five activities with respect to
Web accessibility: ensuring that W3C technologies support accessibility;
developing guidelines for Web content, browsers, and authoring tools;
developing evaluation and repair tools; conducting education and
outreach; and coordinating with research and development that can affect
future Web accessibility.
Prior to joining W3C, Judy was Project Director for the Massachusetts
Assistive Technology Partnership, a U.S. federally-funded project
promoting access to assistive technology for people with disabilities.
She worked on several national initiatives to increase access to
mainstream technology for people with disabilities and to improve dialog
between industry and the disability community. Judy has a background in
applied linguistics, education, technical writing, management and
disability advocacy. She chairs the Web Accessibility Initiative
Interest Group and the Education and Outreach Working Group.
||Censorship on the Web
- Liddy Nevile, University of Melbourne
- Daniel Dardailler, W3C
- Kostas Chandrinos,
- Rigo Wenning, W3C/ INRIA
- Haakon Lie,
- Susan Eng, Toronto
- Ang Peng Hwa, NTU, Singapore
Issues to be considered include :
- is it possible to censor the web;
- who has the right to prevent another from seeing material;
- what is being done in terms of censorship at a personal, organizational or state level;
- what happens when cultures do not support the same values, but their content crosses the borders;
- what new issues are likely to emerge as the web develops
||XML and Databases: Latest Fad or New Disruptive Technology?
- Evan Lenz, XYZFind Corp
- Don Deutsch, Oracle Corp
- Paul Cotton, Microsoft
- Ronald Bourret
- Susan Malaika, IBM
This Panel will bring together researchers, providers and users of database technology. In particular, the following questions will be addressed :
- do we need to combine XML and databases?
- if so how : do we need special purpose XML databases, or do relational databases with extra XML layers suffice, or can relational databases absorb it directly ?
- what are some challenging aspects of it?
- what do customers need and expect?
Michael Rys is Program Manager for SQLServer XML Technologies at
Microsoft. He possesses extensive knowledge of the field of databases
and XML both from a research and product development point of view. He
is a member of the W3C XML query working group and editor of the algebra
working draft and is a member of the team that designs the SQLServer XML
support. Position on panel issues: has one but will be neutral for the
purpose of moderating the panel.
Evan Lenz is an XML developer at XYZFind Corp. in Seattle, WA, where he
helps design XYZFind's native XML database software. His primary area of
expertise lies in XSLT, and he recently joined the W3C XSL Working
Group. In February, he sparked a lively discussion on xml-dev with his
paper, "XQuery: Reinventing the Wheel?". He also served as co-author of
Professional XML 2nd Edition, soon to be published by Wrox Press.
Position in panel: XML databases represent a new disruptive technology.
Don Deutsch is Vice President, Standards Strategy and Architecture, at
Oracle Corporation. Oracle's recently appointed representative on the
W3C Advisory Committee, Don has extensive experience in other standards
and consortia forums. As chairman of the H2 Technical Committee on
Database (a.k.a. the ANSI SQL committee) he has led the development of
SQL language standards for over 20 years. Prior to joining Oracle, Don
held senior development, research and management positions with Sybase,
General Electric and the U.S. National Bureau of Standards. Position on
panel issues: Since XML offers an effective, vendor-neutral format for
data exchange and provides a valuable new paradigm for representing
unstructured data, users and developers benefit most through the tight
integration of XML data into existing data models, applications and
Paul Cotton is Program Manager for XML Standards at Microsoft. Paul is
Chairman of the W3C XML Query Working Group, a member of the XML
Coordination Group and a member of the XML Protocol Working Group. In
addition Paul is one of the ten elected members of the W3C Advisory
Board. Paul has been active in the XML Activity at W3C since 1998 and
has been involved in the standardization of database query languages and
language bindings since the late 1980's. Paul believes that the XML
Query language being defined by the W3C XML Query WG should be suitable
for mapping down onto file store of XML documents, SQL databases,
object-oriented databases and native XML repositories. Thus XQuery will
replace SQL as the "intergalactic query language".
Ronald Bourret is a freelance programmer, writer, and researcher. He is
the creator of XML-DBMS, an object-relational engine for transferring
data between XML documents and relational databases, and has written a
number of popular papers on using XML with databases. Position on panel
issues: Relational databases win the market for accounting data, native
XML databases win the market for documents, and there will be a battle
for business documents such as invoices.
Susan Malaika works for IBM Santa Theresa Lab on the XML Technolgies for