iOS Tesseract OCR Image Preperation

Question

iOS Tesseract OCR Image Preperation

Chciałbym zaimplementować aplikację OCR, która rozpoznałaby tekst ze zdjęć.

Udało mi się skompilować i zintegrować Silnik Tesseract w iOS, udało mi się uzyskać rozsądną detekcję podczas fotografowania przezroczystych dokumentów (lub photoshota tego tekstu z ekranu), ale dla innych tekstów, takich jak drogowskazy, znaki sklepowe, kolorowe tło, wykrywanie nie powiodło się.

Pytanie, jakie preparaty do obróbki obrazu są konieczne, aby uzyskać lepsze uznanie. Na przykład oczekuję, że musimy przekształcić obrazy w skalę szarości /B&W, a także poprawić kontrast itp.

Jak można to zrobić w iOS, czy jest do tego jakiś pakiet?

15

ios image-processing ocr tesseract

Author: alandalusi, 2012-11-22

Source

2 answers

Użyłem powyższego kodu, ale dodałem również dwa inne wywołania funkcji, aby przekonwertować obraz tak, aby działał z Tesseract.

Najpierw użyłem skryptu zmiany rozmiaru obrazu do konwersji na 640 x 640, który wydaje się być łatwiejszy do opanowania dla Tesseract.

-(UIImage *)resizeImage:(UIImage *)image {

    CGImageRef imageRef = [image CGImage];
    CGImageAlphaInfo alphaInfo = CGImageGetAlphaInfo(imageRef);
    CGColorSpaceRef colorSpaceInfo = CGColorSpaceCreateDeviceRGB();

    if (alphaInfo == kCGImageAlphaNone)
        alphaInfo = kCGImageAlphaNoneSkipLast;

    int width, height;

    width = 640;//[image size].width;
    height = 640;//[image size].height;

    CGContextRef bitmap;

    if (image.imageOrientation == UIImageOrientationUp | image.imageOrientation == UIImageOrientationDown) {
        bitmap = CGBitmapContextCreate(NULL, width, height, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, alphaInfo);

    } else {
        bitmap = CGBitmapContextCreate(NULL, height, width, CGImageGetBitsPerComponent(imageRef), CGImageGetBytesPerRow(imageRef), colorSpaceInfo, alphaInfo);

    }

    if (image.imageOrientation == UIImageOrientationLeft) {
        NSLog(@"image orientation left");
        CGContextRotateCTM (bitmap, radians(90));
        CGContextTranslateCTM (bitmap, 0, -height);

    } else if (image.imageOrientation == UIImageOrientationRight) {
        NSLog(@"image orientation right");
        CGContextRotateCTM (bitmap, radians(-90));
        CGContextTranslateCTM (bitmap, -width, 0);

    } else if (image.imageOrientation == UIImageOrientationUp) {
        NSLog(@"image orientation up");

    } else if (image.imageOrientation == UIImageOrientationDown) {
        NSLog(@"image orientation down");
        CGContextTranslateCTM (bitmap, width,height);
        CGContextRotateCTM (bitmap, radians(-180.));

    }

    CGContextDrawImage(bitmap, CGRectMake(0, 0, width, height), imageRef);
    CGImageRef ref = CGBitmapContextCreateImage(bitmap);
    UIImage *result = [UIImage imageWithCGImage:ref];

    CGContextRelease(bitmap);
    CGImageRelease(ref);

    return result;
}

Tak, aby radiany pracowały nad @implementation

static inline double radians (double degrees) {return degrees * M_PI/180;}

Następnie konwertuję do skali szarości.

Znalazłem ten artykuł Konwertuj obraz na skalę szarości przy konwersji na skala szarości.

Z powodzeniem wykorzystałem kod stąd i mogę teraz czytać inny kolor tekstu i różne kolorowe tła

Zmodyfikowałem nieco kod, aby działał jako funkcja w ramach klasy, a nie jako własna klasa, którą zrobiła druga osoba

- (UIImage *) toGrayscale:(UIImage*)img
{
    const int RED = 1;
    const int GREEN = 2;
    const int BLUE = 3;

    // Create image rectangle with current image width/height
    CGRect imageRect = CGRectMake(0, 0, img.size.width * img.scale, img.size.height * img.scale);

    int width = imageRect.size.width;
    int height = imageRect.size.height;

    // the pixels will be painted to this array
    uint32_t *pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));

    // clear the pixels so any transparency is preserved
    memset(pixels, 0, width * height * sizeof(uint32_t));

    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

    // create a context with RGBA pixels
    CGContextRef context = CGBitmapContextCreate(pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
                                                 kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);

    // paint the bitmap to our context which will fill in the pixels array
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), [img CGImage]);

    for(int y = 0; y < height; y++) {
        for(int x = 0; x < width; x++) {
            uint8_t *rgbaPixel = (uint8_t *) &pixels[y * width + x];

            // convert to grayscale using recommended method:     http://en.wikipedia.org/wiki/Grayscale#Converting_color_to_grayscale
            uint32_t gray = 0.3 * rgbaPixel[RED] + 0.59 * rgbaPixel[GREEN] + 0.11 * rgbaPixel[BLUE];

            // set the pixels to gray
            rgbaPixel[RED] = gray;
            rgbaPixel[GREEN] = gray;
            rgbaPixel[BLUE] = gray;
        }
    }

    // create a new CGImageRef from our context with the modified pixels
    CGImageRef image = CGBitmapContextCreateImage(context);

    // we're done with the context, color space, and pixels
    CGContextRelease(context);
    CGColorSpaceRelease(colorSpace);
    free(pixels);

    // make a new UIImage to return
    UIImage *resultUIImage = [UIImage imageWithCGImage:image
                                             scale:img.scale
                                       orientation:UIImageOrientationUp];

    // we're done with image now too
    CGImageRelease(image);

    return resultUIImage;
}

9

Author: Adam Richardson,
Warning: date(): Invalid date.timezone value 'Europe/Kyiv', we selected the timezone 'UTC' for now. in /var/www/agent_stack/data/www/doraprojects.net/template/agent.layouts/content.php on line 54
2017-05-23 12:16:39

score 15 · Accepted Answer

Obecnie pracuję nad tym samym. Okazało się, że PNG zapisany w Photoshopie działał dobrze, ale obraz, który pierwotnie został pozyskany z aparatu, a następnie zaimportowany do aplikacji, nigdy nie działał. Nie proś mnie o wyjaśnienie - ale zastosowanie tej funkcji sprawiło, że te obrazy działały. Może tobie też się uda.

// this does the trick to have tesseract accept the UIImage.
UIImage * gs_convert_image (UIImage * src_img) {
    CGColorSpaceRef d_colorSpace = CGColorSpaceCreateDeviceRGB();
    /*
     * Note we specify 4 bytes per pixel here even though we ignore the
     * alpha value; you can't specify 3 bytes per-pixel.
     */
    size_t d_bytesPerRow = src_img.size.width * 4;
    unsigned char * imgData = (unsigned char*)malloc(src_img.size.height*d_bytesPerRow);
    CGContextRef context =  CGBitmapContextCreate(imgData, src_img.size.width,
                                                  src_img.size.height,
                                                  8, d_bytesPerRow,
                                                  d_colorSpace,
                                                  kCGImageAlphaNoneSkipFirst);

    UIGraphicsPushContext(context);
    // These next two lines 'flip' the drawing so it doesn't appear upside-down.
    CGContextTranslateCTM(context, 0.0, src_img.size.height);
    CGContextScaleCTM(context, 1.0, -1.0);
    // Use UIImage's drawInRect: instead of the CGContextDrawImage function, otherwise you'll have issues when the source image is in portrait orientation.
    [src_img drawInRect:CGRectMake(0.0, 0.0, src_img.size.width, src_img.size.height)];
    UIGraphicsPopContext();

    /*
     * At this point, we have the raw ARGB pixel data in the imgData buffer, so
     * we can perform whatever image processing here.
     */


    // After we've processed the raw data, turn it back into a UIImage instance.
    CGImageRef new_img = CGBitmapContextCreateImage(context);
    UIImage * convertedImage = [[UIImage alloc] initWithCGImage:
                                 new_img];

    CGImageRelease(new_img);
    CGContextRelease(context);
    CGColorSpaceRelease(d_colorSpace);
    free(imgData);
    return convertedImage;
}

Przeprowadziłem też wiele eksperymentów przygotowując obraz dla tesseract. Zmiana rozmiaru, konwersja do skali szarości, a następnie regulacja jasności i kontrastu wydaje się działać najlepszy.

Próbowałem również tej biblioteki GPUImage. https://github.com/BradLarson/GPUImage A GPUImageAverageLuminanceThresholdfilter wydaje mi się, że daje świetny poprawiony obraz, ale tesseract chyba nie działa dobrze z nim.

Dodałem również opencv do mojego projektu i planuję wypróbować jego procedury graficzne. Być może nawet jakieś wykrywanie pola, aby znaleźć obszar tekstowy(mam nadzieję, że to przyspieszy tesseract).