User login

Algorithms

Running algorithms is currently only available through web services as described here.

WSDL interfaces of the algorithms listed here can be obtained from here and here.

  • convert - Version 6.4.6
    Successful Executions: 5814
    Ratio of executions: (wrt to other algorithms) .3442
    This algorithm is distributed along with the imagemagick (http://www.imagemagick.org/) toolbox. It can convert images to different formats, such as jpg, png, pnm, tif, etc.

    Input Parameters

    Argument 1 (page_image)
    The input image to the convert algorithm
    Argument 2 (extension)
    Filename extension for convert ouput and type

    Output Parameters

    Argument 2 (page_image)
    The output image from the convert algorithm
  • Stanford-NER - Version 0
    Successful Executions: 4446
    Ratio of executions: (wrt to other algorithms) .2632
    Named entity detection algorithm by Jenny Finkel and Christopher Manning as described in Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. "Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling". Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-370.

    Input Parameters

    Argument 1 (text/plain)
    The input text file for the Stanford-NER algorithm

    Output Parameters

    Argument 1 (NER tagged)
    The tagged output file for the Stanford-NER algorithm
  • Tesseract - Version 2.04
    Successful Executions: 2612
    Ratio of executions: (wrt to other algorithms) .1546
    This is an OCR algorithm by HP Labs and Google. "An Overview of the Tesseract OCR Engine", Ray Smith, Proc. Ninth Int. Conference on Document Analysis and Recognition (ICDAR), 2007, pp. 629-633. DOI: 10.1109/ICDAR.2007.4376991

    Input Parameters

    Argument 1 (image/tif 8bit uncompressed)
    The input image to the Tesseract algorithm.

    Output Parameters

    Argument 2 (text/plain)
    The output file from the Tesseract algorithm with the OCR transcription.
  • ocrad - Version 0.19
    Successful Executions: 2246
    Ratio of executions: (wrt to other algorithms) .1330
    This is an OCR algorithm supported by the GNU project (http://www.gnu.org/s/ocrad/). It takes a pnm image as input and produces layout information and ocr results as output.

    Input Parameters

    Argument 1 (image/x-portable-anymap)
    The input image to the ocrad algorithm.

    Output Parameters

    Argument 2 (text/plain)
    OCR transcription output from the ocrad algorithm.
    Argument 3 (ocrad layout)
    Layout output from the ocrad algorithm.
  • NCI-CADD segmentation - Version 1.0
    Successful Executions: 752
    Ratio of executions: (wrt to other algorithms) .0445
    Segments a black-and-white input image into text, line drawing and picture images. Contribution by the NCI/CADD group at the National Institutes of Health for the ICDAR 2011 Document Analysis Algorithm Contributions in End-to-End Applications Contest.

    Input Parameters

    Argument 1 (page_image)
    input black-and-white image

    Output Parameters

    Argument 1 (page_image)
    segmented image of text
    Argument 2 (page_image)
    segmented image of line drawings
    Argument 3 (page_image)
    segmented image of photos and pictures
  • MergeImageList - Version 1.0
    Successful Executions: 669
    Ratio of executions: (wrt to other algorithms) .0396
    Takes a list of images as input and creates a new image by merging them all together, using the ImageMagick toolbox (http://www.imagemagick.org/)

    Input Parameters

    Argument 1 (page_image_array)
    input for MergeImageList

    Output Parameters

    Argument 1 (page_image)
    Output of MergeImageList
  • NCI-CADD binarization - Version 1.0
    Successful Executions: 290
    Ratio of executions: (wrt to other algorithms) .0172
    Converts color or grayscale image to black-and-white. Winning contribution by the NCI/CADD group at the National Institutes of Health for the ICDAR 2011 Document Analysis Algorithm Contributions in End-to-End Applications Contest.

    Input Parameters

    Argument 1 (page_image)
    input color or grayscale image file

    Output Parameters

    Argument 1 (page_image)
    output black-and-white image file
  • DICE - Version 1.0
    Successful Executions: 38
    Ratio of executions: (wrt to other algorithms) .0022
    DICE stands for Document Image Content Extraction. It aims to find regions containing machine-printed text (MP), handwriting (HW), photographs (PH), line-art (LA), etc. DICE classifies individual pixels. This avoids arbitrary and restrictive region shapes. Up to date, DICE is implemented to classify MP, HW, PH and Blank (BL). Sui-Yu Wang, Henry S. Baird, Chang An: "Document Content Extraction Using Automatically Discovered Features". ICDAR 2009: 1076-1080 DOI: 10.1109/ICDAR.2009.198

    Input Parameters

    Output Parameters

  • QGar Arc Detection - Version 1.0
    Successful Executions: 12
    Ratio of executions: (wrt to other algorithms) .0007
    Arc detection algorithm as described in B. Lamiroy and Y. Guebbas, "Robust and Precise Circular Arc Detection" in Graphics Recognition. Achievements, Challenges, and Evolution Lecture Notes in Computer Science, 2010, Volume 6020/2010, 49-60, DOI: 10.1007/978-3-642-13728-0_5.

    Input Parameters

    Argument 1 (boolean)
    If set to false or 0 only full circles will be detected, if set to true or 1 circular arcs will also be detected. Default value is true
    Argument 2 (image/pbm)
    pbm graphic image in which circles or arcs will be detected

    Output Parameters

    Argument 1 (VECfile)
    detected arcs and circles
  • Kanungo Degradation - Version 1.0
    Successful Executions: 8
    Ratio of executions: (wrt to other algorithms) .0005
    This is an image degradation algorithm due to Kanungo et al. as described in T. Kanungo, R.M. Haralick, H.S. Baird, W. Stuezle, and D. Madigan. "A statistical, nonparametric methodology for document degradation model validation". IEEE Transactions on PAMI, 22(11):1209-1223, November 2000. It takes a pbm image as input. This particular implementation is part of the Qgar software package.

    Input Parameters

    Output Parameters

  • ArcEval - Version 2005
    Successful Executions: 5
    Ratio of executions: (wrt to other algorithms) .0003
    Evaluation Software for Arc Detection by Dr. Liu Wenyin. Liu Wenyin and Dov Dori, "A Protocol for Performance Evaluation of Line Detection Algorithms", in Machine Vision and Applications, Special Issue on Performance Characteristics of Vision Algorithms, Vol. 9, No. 5/6, pp. 240-250, 1997.

    Input Parameters

    Argument 1 (VECfile)
    file containing measured arcs to compare to "ground truth" in Vec format
    Argument 2 (VECfile)
    Reference file containing "ground truth" arcs in Vec format

    Output Parameters

    Argument 1 (ArcEval 2005)
    Measured difference between the two provided input files