MetaData: Qualifying Web Objects
Workshop Osnabrück 13. - 15. October 1997
Report

Roland Schwänzl

The talks delivered centered at four headings

Descriptive MetaData
Structure embedded in Documents
Tools for Content Analysis
Concrete Retrieval Environments

The idea is, to view these items as components to upgrade resource discovery and to provide tools to faciliate access to content in an electronic environment.

Descriptive MetaData

MetaData is a means to transport structured assertions about resources.

The talks by Stu Weibel on a standardized namespace DC (DublinCore) - to adapt an XML wording - and by Renato Janello on W3C's effort to provide an XML application (RDF) which implements the ideas of the Warwick Framework both showed the progress made with descriptive MetaData since the first MetaData Workshop organised by the AK MetaDaten und Klassifikation the year before at Göttingen.

Some days before the Göttingen Workshop the first stable version of plain 15 elements DublinCore was published.

Since then there were the DC Workshops at Canberra and Helsinki, which opened the path for DublinCore with qualifiers, which allows for instance for (community specific) refinements of the magic 15 and for use of a variety of subject schemes.

The DublinCore initiative worked with W3C on a better definition for attributes of the HTML MetaTag. The result has entered the recent W3C recommendation for HTML 4.0

The next step of collaboration with W3C is the development of RDF - the Resource Description Framework - which shows the semantics encoded in DublinCore as an XML namespace.

The more powerfull syntax of RDF will remove quite a lot of trouble with HTML Meta. A grouping mechanism for instance comes with RDF, which HTML Meta lacks off - even in version 4.0.

Grouping for instance is necessary already in case one just wants to provide one single set of MetaData say for a PostScript and a PDF version of the same resource.

Development of RDF now (October 1997) has reached the level of ``public draft'' meaning, that currently implementations can not expect to become fully supported by the eventual W3C recommendation.

The need for standardization arises with resource discovery in a global distributed environment such as the Internet, in linking user queries to the content of databases, in providing information about terms and conditions on the access to a resource, in linking information from different databases. Eventually MetaData also enter the presentation (visualisation) of information found to the human user.

Digital signatures not explicitely mentioned during the workshop also eventually will fit with RDF.

Two different approaches to descriptive MetaData were mentionend. Barghorn explained about the SGML DTD ISO 12083 developed till 1991 (Majour Header) that is in a Pre-Web period by publishers and Haber pointed out the related approach in MEDOC.

There is considerable overlap with part of the semantics of DC in ISO 12083. It would be interesting to have a crosswalk for the semantics actually and built on that an automatic converter of Majour Headers to RDF format to achive interoperability.

Of course classical library cataloguing schemes also have to be considered in this context. Their relation with the DublinCore approach already was dealt with the year before at Göttingen MetaData Workshop. Specific experience with digitized images is reported in [M. Larsgaard].

An approach similar to RDF seems to underly Puder's proposal to define service trading.

A particular issue with WEB resources is the URN. A recent (experimental) implementation of part of the expected functionality is the DOI sponsered by some publishing houses. DOI's are coded as mostly numerical URL's. They call a database server which resolves the DOI with an HTTP redirect.

Example of a DOI (30 Dec 1997):

http://hdl.handle.net/10.1007/0938-8990(199701)8:1<21:AMGLMO>2.0.CO;2-B

Recently (27 Jan 1998) the publisher 10.1007 has choosen a more simple syntax for its DOI's:

http://hdl.handle.net/10.1007/s0001239700222

The interested reader may want to check the server of the DOI organisation for up to date information. .

Structure embedded in Documents

Here one has to mention the languages under development MathML and CML. MathML perhaps has the potential to become a successor to TEX in the long run. It will allow computer algebra systems being used to give mathematical content interactive capabilities [P. Ion]. Especially this intention of MathML becomes more concretely persued by the Open Math [W. Werner] group.

Specification of MathML is not complete and still no ``real world'' examples exist.

One application [W.D. Ihlenfeldt]of the Chemical Mark-up Language (CML) is efficient storage of molecule information, to allow for graphical output with functional operation on it (via Java-Applets). In contrast with MathML the Chemical Mark-up Language already enjoys a useful implementation.

In both cases the MetaInformation living in the structure of the documents is what makes for interactive capabilities and gives retrieval questions a new (non-classical ?) flavour. (In relation with CML the idea to put content description into MIME Types was mentioned. It seems as an overload of MIME type functionality).

Tools for Content Analysis

Complex field specific thesauri appeared both as a means of post summarizing (automatic classification) as well as tool to be integrated in user interfaces. There was demand for and will to build such thesauri found at the workshop. Such thesauri should enable a machine to catch information which in near to a given one [M. Hazewinkel].

Hazewinkel estimated 120 000 vertices for a dynamical (simplicial) object useful to locate mathematical papers.

TOSCANA (http://www.mathematik.th-darmstadt.de/ags/ag1/software/ToscanaDemo/ToscanaDemo.html) [R. Wille]is a system to help the user to find it's way through a structured information space by supplying local graphs.

Osiris [H. Zillmann] takes the path of a linguistic analysis of meaningful user input to locate information. There is ongoing work also with automatic classification.

These tools appeared as modules proposing themselves also as components for advanced retrieval systems.

Concrete Retrieval Environments

The architecture of the MEDOC system developed during the last two years was sketched in Haber's talk. Currently there is one test installation running for retrieval of Computer Science literature.

More widespread is the use of Harvest. It's use is rather convenient with HTML META coded MetaData, as Harvest turns the NAME of an HTML META - Tag into a searchable field.

It's use was demonstrated for Math-Net [W. Dalitz], whose objective is to provide quality internet services for Mathematics, and as part of a digital library project (ELib).

In particular with Math-Net Harvest's collaboration with HTML META is essential.

Harvest appeared as living software. Several recent enhancements were mentioned. In particular a module turning it's gatherer into a configurable robot, which allows for real incremental gathering [J.Plümer].

An interoperable approach to an information system in Physics was presented by E. Hilf and Th. Severiens.

Maybe it's a good idea to consider a more complete re - implementation of the Harvest basic approach to allow for modular use of ``intelligent'' tools.

A project, which brings together MetaInformation from different sources - using DublinCore as ``gateway'' is EULER. It will link the content information of the Zentralblatt für Mathematik with the Göttingen PICA based library catalogue, thereby adding the functionality of fast document delivery to the Zentralblatt.

Detailed material available for the workshop is accessible via the workshop homepage.

References

Stuart WeibelDublin Core - State of the art after DC5
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/weibel19.12.97.html
DublinCore Initiative DublinCore Home
http://purl.oclc.org/metadata/dublin_core /
W3CExtensible Markup Language (XML)
http://w3c.org/XML/
W3CMetadata and Resource Description
http://w3c.org/Metadata/
AK MetaDaten und KlassifikationMetaDaten und Strukturierung elektronischer Information
http://www.mathematik.uni-osnabrueck.de/ak-technik/anlagen/vortrag.html
K. BarghornPreparing Documents for Electronic Publishing
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/barghorn/
C. HaberThe MEDOC Project
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/haber16.10.97.ps
A. PuderUsing meta-level specifications for the service trading in open distributed systems
http://www.vsb.cs.uni-frankfurt.de/~puder/
DOI HomeDOI
http://www.doi.org
W3CHTML-4.0 Recommendation
http://w3c.org/Press/HTML4-REC
DublinCore InitiativeThe 5th Dublin Core Metadata Workshop Helsinki
http://linnea .helsinki.fi/meta/DC5.html
DublinCore InitiativeThe 4th Dublin Core Metadata Workshop Canberra
http://www.dstc.edu.au/DC4/
W3CMathML
http://w3c.org/Math/
Peter Murray-RustChemical Markup Language - CML
http://www.venus.co.uk/omf/cml/
Wend WernerMathML and OpenMath
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/wend.html
Patrick D. F. IonGetting Math on the Web
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/abstracts/ion.html
Wolf-Dietrich IhlenfeldtThe Role of the Chemical Markup Language (CML)
http://www2.ccc.uni-erlangen.de/Wolf_Ihlenfeldt/slides/cml/index.html
R. WilleTOSCANA
http://www.mathematik.th-darmstadt.de/ags/ag1/software/ToscanaDemo/ToscanaDemo.html
H. ZillmannOSIRIS
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/abstracts/zillmann.html
Harvest HomeHarvest
http://harvest.transarc.com/
Harvest IndexerHarvest Work Group - Tardis - (Uni Edinburgh)
http://www.tardis.ed.ac.uk/harvest/
W. DalitzMath-Net
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/abstracts/dalitz.ps
ELib OsnabrückELib Home
http://elib.uni-osnabrueck.de
Judith PlümerComponents of an Electronic Library
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/pluemer.html
E. HilfMetaData for quality control of information in Physics
http://elfikom.physik.uni-oldenburg.de/bmbf/slot4/docs//osna-141097.html
Th. SeveriensThe EuroPhysNet-Project
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/severien5.1.98.html
M. JostEULER
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/jo23.10.html
M. HazewinkelConcept Building from Keyphrases
http://dbs.cwi.nl/cwwwi/owa/cwwwi.print_projects?ID=62
M. LarsgaardMetadata applied to Digitized Images
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/larsgard7.12.html
MetaData: Qualifying WebObjectsWorkshop HomePage
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/

AK MetaDaten und Klassifikation der IuK Kommission wissenschaftlicher Fachgesellschaften
upload: 21 Jan 1998, last modified: 22 Jan 1998; URL: http://www.mathematik.uni-osnabrueck.de/projects/workshop97/os.html


Annotations

M. Hazewinkel Topologies and metrics on information spaces
http://www.mathematik.uni-osnabrueck.de/projects/workshop97/papers/haze31.3.98.ps.gz
upload: 7 Apr 1998