Johannes Gutenberg Universität Mainz Bild Seitenkopf
Johannes Gutenberg Universität Mainz
Homepage dieser Website Google-Suche Alle Seiten von A bis Z Kontakt
 

 

Publications

2009

Thomas Gottron Content Extraction: Bestimmung des Hauptinhaltes in HTML Dokumenten
Ausgezeichnete Informatikdissertationen 2008, Dorothea Wagner et al. (Hrsg.), Lecture Notes in Informatics, 2009, 101—110. [PDF]

Yves Weißig, Thomas Gottron Combinations of Content Extraction Algorithms
LWA'09: Workshop Information Retrieval, 2009. [PDF] [BibTeX]

Constanze Lipowsky, Egor Dranischnikow, Herbert Göttler, Thomas Gottron, Mathias Kemeter, Elmar Schömer Alignment of Noisy and Uniformly Scaled Time Series
DEXA'09: Proceedings of the 20th International Conference on Database and Expert Systems Applications, 2009, 675—688. [PDF] [BibTeX]

Thomas Gottron Document Word Clouds: Visualising Web Documents as Tag Clouds to Aid Users in Relevance Decisions
ECDL'09: Proceedings of the 13th European Conference on Digital Libraries, 2009, 94—105. [PDF] [BibTeX]

Thomas Gottron, Roman Schneider A Hybrid Approach to Statistical and Semantical Analysis of Web Documents
EuroIMSA'09: Proceedings of the 5th European Conference on Internet and Multimedia Systems and Applications, 2009, 115—120. [PDF] [BibTeX]

Thomas Gottron An Evolutionary Approach to Automatically Optimise Web Content Extraction
IIS'09: Proceedings of the 17th International Conference Intelligent Information Systems, 2009, 331—343. [PDF] [BibTeX]

Thomas Gottron Detecting Website Redesigns via Template Similarity on Streams of Documents
ITA'09: Proceedings of the 3rd International Conference on Internet Technologies and Applications, 2009, 35—43. [PDF] [BibTeX]
ITA'09 Best Paper Award

Thomas Gottron, Ludger Martin Estimating Web Site Readability Using Content Extraction
WWW'09: Proceedings of the 18th International Conference on World Wide Web; Posters Track, 2009, 1169—1170. [PDF] [BibTeX]


2008

Thomas Gottron, Content Extraction: Identifying the Main Content in HTML Documents
Dissertation, Johannes-Gutenberg Universität Mainz, 2008. [PDF] [BibTeX]

Thomas Gottron, Combining Content Extraction Heuristics: The CombinE System
iiWAS'08: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services; Special Track: Emerging Research Projects, Applications and Services (ERPAS), 2008, 591—595. [PDF] [BibTeX]

Thomas Gottron, Content Code Blurring: A New Approach to Content Extraction
TIR'08: Proceedings of the 5th International Workshop on Text Information Retrieval, 2008, 29—33. [PDF] [BibTeX]

Thomas Gottron, Clustering Template Based Web Documents
ECIR '08: Proceedings of the 30th European Conference on Information Retrieval, 2008, 40—51. [PDF] [BibTeX]

Thomas Gottron, Bridging the Gap: From Multi Document Template Detection to Single Document Content Extraction
EuroIMSA'08: Proceedings of the IASTED Conference on Internet and Multimedia Systems and Applications, 2008, 66—71. [PDF] [BibTeX]


2007

Thomas Gottron, Evaluating Content Extraction on HTML Documents
ITA '07: Proceedings of the 2nd International Conference on Internet Technologies and Applications, 2007, 123—132. [PDF] [BibTeX]

 


Institut für Informatik, 07.11.2009   ImpressumImpressum   Zum SeitenanfangZum Seitenanfang