OCR software for Japanese PDFs with various formatting in one file
Thread poster: A_Lilie
A_Lilie
A_Lilie
Germany
Local time: 14:27
Japanese to German
+ ...
Sep 18, 2019

I'm hoping someone here can point me to an OCR software that can recognize Japanese text in various formats throughout the same file.

I have tried lots during the past week, but never got good results. I'm talking about files with left to right text, top to bottom, right to left text, text in various colors (black, but also light blue, or orange), like you get on posters, all in the same file. Most of the software I tried didn't even recognize most of the text, especially if it wasn
... See more
I'm hoping someone here can point me to an OCR software that can recognize Japanese text in various formats throughout the same file.

I have tried lots during the past week, but never got good results. I'm talking about files with left to right text, top to bottom, right to left text, text in various colors (black, but also light blue, or orange), like you get on posters, all in the same file. Most of the software I tried didn't even recognize most of the text, especially if it wasn't black on white. The best results as far as simple text recognition goes, I had with just google drive, converting the PDF documents into simple word documents, but with this method the text is all over the place and I got some errors with the recognition of more complicated characters. I also tried some Japanese software, but am having trouble installing them on my computer. Unfortunately, the error messages I get themselves are gibberish. I suspect the original language was Japanese, but didn't work on a German Windows 10, so it was converted to symbols. I think the root of the error lies in the fact that when downloading, some of the Japanese file names were converted into symbols and now the installation interface can't find them. Sorry if that doesn't make much sense, I'm really not good with computers. Please do tell me, if any of you have any tips about installing Japanese programs on a non-Japanese system, though!

At the moment I just use google drive and then puzzle the text where it belongs myself, but I'd like to improve my workflow for projects involving those PDFs, as I keep getting more of those. I'm not bound by any NDA at the moment, but would for the future prefer methods that don't involve uploading the files anywhere, as well.

Thank you so much!
Collapse


 
Adam Warren
Adam Warren  Identity Verified
France
Local time: 14:27
Member (2005)
French to English
Off the cuff, try Nuance Power PDF Sep 18, 2019

although I have no idea whether it supports Japanese. HTH

 
esperantisto
esperantisto  Identity Verified
Local time: 16:27
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
ABBYY FineReader… Sep 18, 2019

… as of the version 11, if I'm not mistaken, can recognize CJK out of the box. But don't expect a magic wand, tons of errors are inevitable in complex scripts. Better ask your clients to supply source documents from which PDFs were created.

Dan Lucas
Adam Warren
 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 13:27
Member (2014)
Japanese to English
Inherently difficult Sep 18, 2019

esperantisto wrote:
But don't expect a magic wand, tons of errors are inevitable in complex scripts.

I agree. The examples the OP gives are exactly the kind that are challenging for OCR software. I use FineReader, and it works well on simple Japanese documents, but often very poorly on complex files.

Be crystal clear to the client that you are offering a translation, not a DTP service, unless the latter is paid for. If you accept such jobs, you need to include a quote for OCR'ing the original and correcting errors, etc. in the resulting file, at an hourly rate that seems fair to you. Maybe start at 50 euro or so.

Also remember to charge for formatting of the target file (usually a Word document) if the client wants that. I usually suggest that staff at the end client do this kind of formatting.

Personally, I try to avoid such projects.

Regards,
Dan


esperantisto
 
A_Lilie
A_Lilie
Germany
Local time: 14:27
Japanese to German
+ ...
TOPIC STARTER
ABBY didn't work :( Sep 18, 2019

esperantisto wrote:

… as of the version 11, if I'm not mistaken, can recognize CJK out of the box. But don't expect a magic wand, tons of errors are inevitable in complex scripts. Better ask your clients to supply source documents from which PDFs were created.



Thanks, I tried ABBY fine reader, but it, unfortunately, seems like it doesn't recognize colored text that's not left to right at all. I didn't realize this is such a rare thing, actually, because google drive can read and convert most of the text, it's just jumbled up. What bugs me is, that the script isn't all that complex, it's just the formatting that's not complying to western standards and seems to be an issue. What a bummer, but I'll keep looking.

If I could get the material from my client, I would definitely not have asked for a workaround here, so that's not an option, unfortunately.

Thanks anyway!


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 13:27
Member (2014)
Japanese to English
Japanese software Sep 19, 2019

A_Lilie wrote:
Thanks, I tried ABBY fine reader, but it, unfortunately, seems like it doesn't recognize colored text that's not left to right at all.

Depending how desperate you are, you might want to try the software developed in Japan, such as 読取革命 or e.Typist. I think Panasonic had one as well. There will probably be trial versions of them all. 読んde!!ココ used to be quite popular, but I don't think it's being updated or supported any more.

Dan


 
A_Lilie
A_Lilie
Germany
Local time: 14:27
Japanese to German
+ ...
TOPIC STARTER
Thanks for your reply! Sep 19, 2019

Dan Lucas wrote:

A_Lilie wrote:
Thanks, I tried ABBY fine reader, but it, unfortunately, seems like it doesn't recognize colored text that's not left to right at all.

Depending how desperate you are, you might want to try the software developed in Japan, such as 読取革命 or e.Typist. I think Panasonic had one as well. There will probably be trial versions of them all. 読んde!!ココ used to be quite popular, but I don't think it's being updated or supported any more.

Dan


I did research OCR software in Japanese and found some great review sites comparing what's on the market and how accurate the respective software is. Yes, 読んde‼ココ seems to be great and widely used, but it is discontinued, so no more updates and it will therefore definitely fade out in the upcoming years, so I'm not sure about buying the software now. I couldn't find any trial version either, but when I read that it isn't updated anymore, I admittedly didn't spent much time looking. I tried the trial version of Panasonic's 読取革命, but as I mentioned in my original post, I can't get the software to run, as I keep getting error messages that are just symbols or literally "????? ?????". I have no idea what's causing this. Changing my system's language to Japanese didn't change anything either, so I basically gave up. I think I'll try the windows help desk next, though! It does have good reviews concerning accuracy!

Do you have one of the Japanese pieces of software installed? If so, did you run into any errors and managed to fix them? I'm really hesitant to buy the software if I can't be sure I can install it.


Edit: Okay, I searched some more and Panasonic specifically mentions that 読取革命 is compatible wit the Japanese version of windows 10 and I found a lot of people having problems running Japanese software on Windows OSs that are not Japanese. I guess that is the main issue I'm having. So I'd be really glad if anyone knows of a workaround!

[Edited at 2019-09-19 15:45 GMT]


 
esperantisto
esperantisto  Identity Verified
Local time: 16:27
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Try converting to black and white Sep 19, 2019

A_Lilie wrote:

Thanks, I tried ABBY fine reader, but it, unfortunately, seems like it doesn't recognize colored text


Try enabling Tools → Options → Document → Color mode → Black and white. Might help.


Dan Lucas
 
A_Lilie
A_Lilie
Germany
Local time: 14:27
Japanese to German
+ ...
TOPIC STARTER
Thank you so much! Sep 19, 2019

esperantisto wrote:
Try enabling Tools → Options → Document → Color mode → Black and white. Might help.


Thank you so much! I think that did do the trick! Yay!


 
James Hodges
James Hodges  Identity Verified
Japan
Local time: 22:27
Japanese to English
Bit Late With My Response But.... Sep 20, 2019

For Japanese character OCR alone, the best software I have used is 読取革命 from Panasonic here in Japan. That being said, I don't know if it is still available/available outside Japan. I should also warn you that it is a bit of a pain to set up. The character recognition is great, the formatting of the text that is read no so much. I should also mention that I've only used it on Windows PCs running Japanese operating systems.

 
A_Lilie
A_Lilie
Germany
Local time: 14:27
Japanese to German
+ ...
TOPIC STARTER
Thanks! Sep 20, 2019

James Hodges wrote:

For Japanese character OCR alone, the best software I have used is 読取革命 from Panasonic here in Japan. That being said, I don't know if it is still available/available outside Japan. I should also warn you that it is a bit of a pain to set up. The character recognition is great, the formatting of the text that is read no so much. I should also mention that I've only used it on Windows PCs running Japanese operating systems.



Yeah, I heard it's great, and it is still available and updated, but Panasonic specifically mentions compatibility with a Japanese Windows OS and I cannot get it to run on my German one Formatting wouldn't even matter much. I just need to extract the text (correctly) into a word file, preferably without having to comb through the whole text afterwards to bring it in order (google often skips from one page to another between text boxes). As long as it recognizes the text and orders it the way it appears in the original PDF file, I'm very happy. ABBYY finereader is doing a good job right now, though! Recognition could be a little better with vertical text, but it's way better than everything I tried before! Can definitely recommend it for people who can't install Japanese software!

There might be workarounds with Japanese software with "virtual machines" (?), which, as I understood, run a virtual computer on your computer and setting that virtual one up with a Japanese OS, you should be able to install Japanese software, but that's a little too complicated for me to actually do.

Thank you for your reply, though! I'll definitely look into 読取革命, if I ever get a Japanese OS!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

OCR software for Japanese PDFs with various formatting in one file






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »