This is a sample SOIF file:
@FILE { http://harvest.sourceforge.net/
update-time{10}: 1065602907
full-text{718}: Harvest: A Distributed Search System
Harvest: A Distributed Search System
Home
Sites using Harvest
Download
Contributed Code
Todo List
Links
Contributors
User's Manual
FAQ
Installation
ChangeLog
NEWS
Harvest
Harvest Homepage
Miscellaneous Documents and Presentations
Directory Index of Work in Progress Version of Harvest
SourceForge: Project Info - Harvest
Stable Version of Harvest
Homepage of stable Version
Historic Versions of Harvest
Harvest User's Manual 1.4.pl2 (January 31, 1996)
Harvest User's Manual 1.4.pl2 as PostScript
Developers
Kang-Jin Lee
Javier Masa Marin
Harald Weinreich
Home
Sites using Harvest
Download
Contributed Code
Todo List
Links
Contributors
User's Manual
FAQ
Installation
ChangeLog
NEWS
headings{111}: Harvest: A Distributed Search System
Harvest
Stable Version of Harvest
Historic Versions of Harvest
Developers
title{37}: Harvest: A Distributed Search System
url-references{836}: harvest/doc/index.html
harvest/doc/sites.html
harvest/doc/download.html
harvest/contrib/index.html
harvest/doc/todo.html
harvest/doc/links.html
harvest/doc/CONTRIBUTORS
harvest/doc/html/manual.html
harvest/doc/html/FAQ.html
harvest/INSTALL.harvest
harvest/ChangeLog
harvest/NEWS
harvest/doc/index.html
misc/
wip/
http://sourceforge.net/projects/harvest/
harvest-1.8/doc/index.html
harvest-1.4.pl2-docs/
harvest-1.4.pl2-docs/user-manual.ps.gz
developers/lee/
developers/masa/
http://www.weinreichs.de/
harvest/doc/index.html
harvest/doc/sites.html
harvest/doc/download.html
harvest/contrib/index.html
harvest/doc/todo.html
harvest/doc/links.html
harvest/doc/CONTRIBUTORS
harvest/doc/html/manual.html
harvest/doc/html/FAQ.html
harvest/INSTALL.harvest
harvest/ChangeLog
harvest/NEWS
http://harvest.sourceforge.net/
http://sourceforge.net/
keywords{595}: Home
Sites using Harvest
Download
Contributed Code
Todo List
Links
Contributors
User's Manual
FAQ
Installation
ChangeLog
NEWS
Harvest Homepage
Miscellaneous Documents and Presentations
Directory Index of Work in Progress Version
of Harvest
SourceForge:
Project Info - Harvest
Homepage of stable
Version
Harvest User's Manual 1.4.pl2
(January 31, 1996)
Harvest User's
Manual 1.4.pl2 as PostScript
Kang-Jin Lee
Javier Masa Marin
Harald Weinreich
Home
Sites using Harvest
Download
Contributed Code
Todo List
Links
Contributors
User's Manual
FAQ
Installation
ChangeLog
NEWS
md5{32}: 2ba0877c91bbc00e6db037d5604ea860
uri{31}: http://harvest.sourceforge.net/
file-size{4}: 3142
type{4}: HTML
gatherer-version{6}: 1.9.10
gatherer-host{10}: dyn214.tab
gatherer-name{37}: Contents of the dyn214.tab WWW server
refresh-rate{6}: 604800
time-to-live{7}: 2419200
last-modification-time{10}: 1039163872
description{37}: Harvest: A Distributed Search System
}
This is the XML file converted from the sample SOIF file by
soif2gils.pl.
<gils>
<availability>
<linkage>
http://harvest.sourceforge.net/
</linkage>
</availability>
<dateOfLastModification>
1039163872
</dateOfLastModification>
<abstract>
Harvest: A Distributed Search System
</abstract>
<author>
</author>
<localSubjectIndex>
<localSubjectTerm>
Home
Sites using Harvest
Download
Contributed Code
Todo List
Links
Contributors
User's Manual
FAQ
Installation
ChangeLog
NEWS
Harvest Homepage
Miscellaneous Documents and Presentations
Directory Index of Work in Progress Version
of Harvest
SourceForge:
Project Info - Harvest
Homepage of stable
Version
Harvest User's Manual 1.4.pl2
(January 31, 1996)
Harvest User's
Manual 1.4.pl2 as PostScript
Kang-Jin Lee
Javier Masa Marin
Harald Weinreich
Home
Sites using Harvest
Download
Contributed Code
Todo List
Links
Contributors
User's Manual
FAQ
Installation
ChangeLog
NEWS
</localSubjectTerm>
</localSubjectIndex>
<supplementalInformation>
<bytes>
3142
</bytes>
</supplementalInformation>
<crossReference>
<linkage>
harvest/doc/index.html
harvest/doc/sites.html
harvest/doc/download.html
harvest/contrib/index.html
harvest/doc/todo.html
harvest/doc/links.html
harvest/doc/CONTRIBUTORS
harvest/doc/html/manual.html
harvest/doc/html/FAQ.html
harvest/INSTALL.harvest
harvest/ChangeLog
harvest/NEWS
harvest/doc/index.html
misc/
wip/
http://sourceforge.net/projects/harvest/
harvest-1.8/doc/index.html
harvest-1.4.pl2-docs/
harvest-1.4.pl2-docs/user-manual.ps.gz
developers/lee/
developers/masa/
http://www.weinreichs.de/
harvest/doc/index.html
harvest/doc/sites.html
harvest/doc/download.html
harvest/contrib/index.html
harvest/doc/todo.html
harvest/doc/links.html
harvest/doc/CONTRIBUTORS
harvest/doc/html/manual.html
harvest/doc/html/FAQ.html
harvest/INSTALL.harvest
harvest/ChangeLog
harvest/NEWS
http://harvest.sourceforge.net/
http://sourceforge.net/
</linkage>
</crossReference>
<title>
Harvest: A Distributed Search System
</title>
<Body-of-text>
Harvest: A Distributed Search System
Harvest: A Distributed Search System
Home
Sites using Harvest
Download
Contributed Code
Todo List
Links
Contributors
User's Manual
FAQ
Installation
ChangeLog
NEWS
Harvest
Harvest Homepage
Miscellaneous Documents and Presentations
Directory Index of Work in Progress Version of Harvest
SourceForge: Project Info - Harvest
Stable Version of Harvest
Homepage of stable Version
Historic Versions of Harvest
Harvest User's Manual 1.4.pl2 (January 31, 1996)
Harvest User's Manual 1.4.pl2 as PostScript
Developers
Kang-Jin Lee
Javier Masa Marin
Harald Weinreich
Home
Sites using Harvest
Download
Contributed Code
Todo List
Links
Contributors
User's Manual
FAQ
Installation
ChangeLog
NEWS
</Body-of-text>
</gils>
This is the soif.flt from Zebra. This is unusable, but shows the mapping from SOIF to GILS.
# Crude input-filter for SOIF records -- one record per file.
# Author: Peter Valkenburg / TERENA (valkenburg@terena.nl)
# Version 0.2 (09/09/1998).
# This sort of follows the Nordic Web Index convention of GILS attribute use.
# Modified by Kang-Jin Lee (lee@arco.de)
# 07/10/1999
# We'll use GILS structured records.
BEGIN { begin record gils }
# URL will be GILS' availability/linkage
/^@[A-Za-z](-|[.A-Za-z_])* { / BODY /$/ {
begin element availability
data -element linkage $1
end element
}
# Type will be GILS' availability/linkageType
/^[tT]ype{[0-9]+}:\t/ BODY /$/ {
begin element availability
data -element linkageType $1
end element
}
# Last modification time will be Bib-1 Use Attribute 1012
/^[lL]ast-[mM]odification-[tT]ime{[0-9]+}:\t/ BODY /$/ {
data -element dateOfLastModification $1
}
# The MD5 checksum is used as a unique identifier under Bib-1 Use Attribute 1007
/^[mM][dD]5{[0-9]+}:\t/ BODY /$/ { data -element controlIdentifier $1 }
# Description will be Bib-1 Use Attribute 62
/^[dD]escription{[0-9]+}:\t/ BODY /^((-|[._A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
data -element abstract $1
unread 2
}
# Author will be Bib-1 Use Attribute 1003 (if gils.abs maps originator to it!!)
/^[aA]uthor{[0-9]+}:\t/ BODY /^((-|[._A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
data -element author $1
unread 2
}
# Keywords will be GILS' localSubjectIndex/localSubjectTerm
/^[kK]eywords{[0-9]+}:\t/ BODY /^((-|[._A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
begin element localSubjectIndex
data -element localSubjectTerm $1
unread 2
end element
}
# File-size will be GILS' supplementalInformation/bytes
/^[fF]ile-[sS]ize{[0-9]+}:\t/ BODY /$/ {
begin element supplementalInformation
data -element bytes $1
unread 2
end element
}
# Update-Time will be GILS' supplementalInformation/lastChecked
/^[uU]pdate-[tT]ime{[0-9]+}:\t/ BODY /$/ {
begin element supplementalInformation
data -element lastChecked $1
unread 2
end element
}
# url-references will be GILS' crossReference/linkage
/^[uU]rl-[rR]eferences{[0-9]+}:\t/ BODY /^((-|[._A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
begin element crossReference
data -element linkage $1
unread 2
end element
}
# Title will be Bib-1 Use Attribute 4
/^[tT]itle{[0-9]+}:\t/ BODY /^((-|[._A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
data -element Title $1
unread 2
}
# Body and Partial-Text will be Bib-1 Use Attribute 1010
# Is Body really commonly used in SOIF? Anyway, Full-Text is used by Harvest.
#/^[bB]ody{[0-9]+}:\t/ BODY /^((-|[._A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
# data -element sampleText $1
# unread 2
# }
/^[fF]ull-[tT]ext{[0-9]+}:\t/ BODY /^((-|[._A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
data -element sampleText $1
unread 2
}
/^[pP]artial-[tT]ext{[0-9]+}:\t/ BODY /^((-|[._A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
data -element sampleText $1
unread 2
}
/^(-|[a-zA-Z0-9])+{[0-9]+}:\t/ BODY /^((-|[_A-Za-z0-9])+{[0-9]+}:\t.*|})$/ {
unread 2
}
END { end record }
This is BIB-1 attribute set. It is a subset of GILS.
# $Id: bib1.att,v 1.1 2002/10/22 12:51:09 adam Exp $ # Bib-1 Attribute Set name bib1 reference Bib-1 att 1 Personal-name att 2 Corporate-name att 3 Conference-name att 4 Title att 5 Title-series att 6 Title-uniform att 7 ISBN att 8 ISSN att 9 LC-card-number att 10 BNB-card-number att 11 BGF-number att 12 Local-number att 13 Dewey-classification att 14 UDC-classification att 15 Bliss-classification att 16 LC-call-number att 17 NLM-call-number att 18 NAL-call-number att 19 MOS-call-number att 20 Local-classification att 21 Subject-heading att 22 Subject-Rameau att 23 BDI-index-subject att 24 INSPEC-subject att 25 MESH-subject att 26 PA-subject att 27 LC-subject-heading att 28 RVM-subject-heading att 29 Local-subject-index att 30 Date att 31 Date-of-publication att 32 Date-of-acquisition att 33 Title-key att 34 Title-collective att 35 Title-parallel att 36 Title-cover att 37 Title-added-title-page att 38 Title-caption att 39 Title-running att 40 Title-spine att 41 Title-other-variant att 42 Title-former att 43 Title-abbreviated att 44 Title-expanded att 45 Subject-precis att 46 Subject-rswk att 47 Subject-subdivision att 48 Number-natl-biblio att 49 Number-legal-deposit att 50 Number-govt-pub att 51 Number-music-publisher att 52 Number-db att 53 Number-local-call att 54 Code-language att 55 Code-geographic att 56 Code-institution att 57 Name-and-title att 58 Name-geographic att 59 Place-publication att 60 CODEN att 61 Microform-generation att 62 Abstract att 63 Note att 1000 Author-title att 1001 Record-type att 1002 Name att 1003 Author att 1004 Author-name-personal att 1005 Author-name-corporate att 1006 Author-name-conference att 1007 Identifier-standard att 1008 Subject-LC-childrens att 1009 Subject-name-personal att 1010 Body-of-text att 1011 Date/time-added-to-db att 1012 Date/time-last-modified att 1013 Authority/format-id att 1014 Concept-text att 1015 Concept-reference att 1016 Any 1016,4,1005,62 att 1017 Server-choice att 1018 Publisher att 1019 Record-source att 1020 Editor att 1021 Bib-level att 1022 Geographic-class att 1023 Indexed-by att 1024 Map-scale att 1025 Music-key att 1026 Related-periodical att 1027 Report-number att 1028 Stock-number att 1030 Thematic-number att 1031 Material-type att 1032 Doc-id att 1033 Host-item att 1034 Content-type att 1035 Anywhere att 1036 Author-Title-Subject
This is the GILS attribute set, which will be used in Harvest to store the summarized objects.
# $Id: gils.att,v 1.1 2002/10/22 12:51:09 adam Exp $ name gils reference GILS-attset include bib1.att att 2000 Distributor att 2001 Distributor-Name att 2002 Index-Terms # Subject-Terms-Contr. att 2003 Purpose att 2004 General-Access-Constraints att 2005 Use-Constraints att 2006 Distributor-Organization att 2007 Distributor-Street-Address att 2008 Distributor-City att 2009 Distributor-State-or-Province att 2010 Distributor-Zip-or-Postal-Code att 2011 Distributor-Country att 2012 Distributor-Network-Address att 2013 Distributor-Hours-of-Service att 2014 Distributor-Telephone att 2015 Distributor-Fax att 2016 Resource-Description att 2017 Order-Information att 2018 Technical-Prerequisites att 2019 Available-Time-Structured att 2020 Available-Time-Textual att 2021 Linkage att 2022 Linkage-Type att 2023 Contact-Name att 2024 Contact-Organization att 2025 Contact-Street-Address att 2026 Contact-City att 2027 Contact-State-or-Province att 2028 Contact-Zip-or-Postal-Code att 2029 Contact-Country att 2030 Contact-Network-Address att 2031 Contact-Hours-of-Service att 2032 Contact-Telephone att 2033 Contact-Fax att 2034 Agency-Program att 2035 Sources-of-Data att 2036 Subject-Thesaurus att 2037 Methodology att 2038 West-Bounding-Coordinate att 2039 East-Bounding-Coordinate att 2040 North-Bounding-Coordinate att 2041 South-Bounding-Coordinate att 2042 Place-Keyword att 2043 Place-Keyword-Thesaurus att 2044 Time-Period-Structured att 2045 Time-Period-Textual att 2046 Cross-Reference-Title att 2047 Cross-Reference-Linkage att 2049 Original-Control-Identifier att 2050 Supplemental-Information att 2051 Record-Review-Date att 2052 Originator-Dissemination-Control att 2053 Security-Classification-Control att 2054 Cost att 2055 Cost-Information att 2056 Schedule-Number att 2057 Controlled-Subject-Index att 2058 Uncontrolled-Term att 2059 Spatial-Domain att 2060 Bounding-Coordinates att 2061 Place att 2062 Time-Period att 2063 Availability att 2064 Order-Process att 2065 Available-Time-Period att 2066 Access-Constraints att 2067 Point-of-Contact att 2068 Cross-Reference att 2069 Available-Linkage att 2070 Cross-Reference-Relationship att 2071 Language-of-Record att 2072 Beginning-Date att 2073 Ending-Date att 2074 Controlled-Term
This is a sample GILS file from Zebra.
<gils> <Title> UTAH EARTHQUAKE EPICENTERS <Acronym> UUCCSEIS </Acronym> </Title> <Originator> UTAH GEOLOGICAL AND MINERAL SURVEY </Originator> <Local-Subject-Index> APPALACHIAN VALLEY; EARTHQUAKE; EPICENTER; SEISMOLOGY; UTAH </Local-Subject-Index> <Abstract> Five files of epicenter data arranged by date comprise this data set. These files are searchable by magnitude and longitude/latitude. Hardcopy of listing and plot of requested area available. Epicenter location and date, magnitude, and focal depth available. <Format> DIGITAL DATA SETS </Format> <Data-Category> TERRESTRIAL </Data-Category> <Comments> Data are supplied by the University of Utah Seismograph Station. The Utah Geologcial and Mineral Survey (UGMS) is merely a clearinghouse of the data. </Comments> </Abstract> <Spatial-Domain> <Geographic-Coverage> US STATE </Geographic-Coverage> <Coverage-Description> UTAH </Coverage-Description> <Bounding-Coordinates> <West-Bounding-Coordinate> -114 </West-Bounding-Coordinate> <East-Bounding-Coordinate> -109 </East-Bounding-Coordinate> <North-Bounding-Coordinate> 42 </North-Bounding-Coordinate> <South-Bounding-Coordinate> 37 </South-Bounding-Coordinate> </Bounding-Coordinates> </Spatial-Domain> <Time-Period> <Time-Period-Textual> -PRESENT </Time-Period-Textual> </Time-Period> <Availability> <Distributor> <Organization> UTAH GEOLOGICAL AND MINERAL SURVEY </Organization> <Street-Address> 606 BLACK HAWK WAY </Street-Address> <City> SALT LAKE CITY </City> <State> UT </State> <Zip-Code> 84108 </Zip-Code> <Country> USA </Country> <Telephone> (801) 581-6831 </Telephone> </Distributor> <Resource-Description> UTAH EARTHQUAKE EPICENTERS </Resource-Description> <Technical-Prerequisites> <Data-Set-Type> AUTOMATED </Data-Set-Type> <Access-Method> BATCH </Access-Method> <Number-of-Records> 8,700 </Number-of-Records> <Computer-Type> PC NETWORK </Computer-Type> <Computer-Location> SALT LAKE CITY, UT </Computer-Location> </Technical-Prerequisites> </Availability> <Access-Constraints> <Documentation> NONE </Documentation> </Access-Constraints> <Use-Constraints> <Status> OPERATIONAL </Status> </Use-Constraints> <Point-of-Contact> <Name> BILL CASE </Name> <Organization> UTAH GEOLOGICAL AND MINERAL SURVEY </Organization> <Street-Address> 606 BLACK HAWK WAY </Street-Address> <City> SALT LAKE CITY </City> <State> UT </State> <Zip-Code> 84108 </Zip-Code> <Country> USA </Country> <Telephone> (801) 581-6831 </Telephone> </Point-of-Contact> <Control-Identifier> ESDD0006 </Control-Identifier> <Record-Source> UTAH GEOLOGICAL AND MINERAL SURVEY </Record-Source> <Date-of-Last-Modification> 198903 </Date-of-Last-Modification> </gils>