Machine translation

Need help with translating WW1, Inter-War or WW2 related documents or information?
User avatar
Der Alte Fritz
Member
Posts: 2171
Joined: 13 Dec 2007, 22:43
Location: Kent United Kingdom
Contact:

Re: Machine translation

#16

Post by Der Alte Fritz » 30 Sep 2016, 18:36

Felix C
seems that you have a number of issues here so dealing with them in turn:

1) Scanning from downloaded pdf. Depends on the quality of the pdf but should be pretty straight forward. Abbyy FineReader 12 opens .pdf and dejavu files and will then scan and OCR them to produce a text document. I would then save this as html and open it in Google, then translate on the page. This method allows you to correct the original scan in AFR and then re-save in html with little effort. Once you are happy and want an exact copy of the book, then save as .rtf file and upload to Google Translator Toolkit and do a serious translation of the document. It helps to keep a glossary of Googlisms so that you can use the Find/Replace feature to do mass change in one go.

User avatar
Der Alte Fritz
Member
Posts: 2171
Joined: 13 Dec 2007, 22:43
Location: Kent United Kingdom
Contact:

Re: Machine translation

#17

Post by Der Alte Fritz » 30 Sep 2016, 18:44

For instance I have downloaded Ivanov - Russian Field Railways a 1927 book from this Russian site:
http://militera.lib.ru/science/0/pdf/ivanov_v01.pdf

which looks like this:
Ivanov.jpg
I ran it through AFR and straight into .html and then Google on page translation and this si what I got:
INTRODUCTION.

All the recent wars, including the imperialist not only show, but also emphasize a responsible role and importance of field (portable) railways in the theater of operations, and if as a prelude to the analysis of our theme to look back a little, we can see that in 1917 on the Russian front was in the exploit-tadii more than 2000 km, and the French more than 4000 km of the field then. d.

As is well known, in the imperialist war automobile communication was the most widespread on the French front, where it was promoted by a well-developed automotive industry and a wide network of roads.

In the service of the French army consisted of more than 100,000 cars, and yet along with them on the same front, we find a highly developed network of field w. e., when, on what historical information such use took place there in position, so still and the moving of the war.

If roughly calculate the total number of kilometers built and dismantled railway field. d. in the same war, the same Russian front, it is more than 6000 km, the latter was limited to the specified number only by virtue of the complete exhaustion of the available stocks of the field then. e., including requisitioned in the factories and seized from the property of the enemy, reach a fairly large numbers.


Felix C
Member
Posts: 1201
Joined: 04 Jul 2007, 17:25
Location: Miami, Fl

Re: Machine translation

#18

Post by Felix C » 30 Sep 2016, 19:01

ok purchasing abbey this weekend. I did not know if it worked with cyrillic text as an OCR. Many thanks.
Still have that issue with books which are bound and stiff at the spine so they do not fully open. Scanning is difficult with a flatbad.

User avatar
Der Alte Fritz
Member
Posts: 2171
Joined: 13 Dec 2007, 22:43
Location: Kent United Kingdom
Contact:

Re: Machine translation

#19

Post by Der Alte Fritz » 30 Sep 2016, 23:39

To do the whole book of 115 pages would take less than 10 minutes for the scanning and a basic 'verify text' to clean up the text a little maybe two hours. It is at least sufficiently good for me to decide where to invest my time for a proper line by line translation.

2) Scanning is a different proposition and really flat bed scanners are not ideal.
a) A Book Scanner can be made using a couple of compact cameras and a whole lot of woodwork https://www.diybookscanner.org/
b) You can buy a cheap book scanner which works by scanning right up to the edge of the platen and the book hangs over the edge of the scanner (so the spine is on the edge of the platen. https://www.youtube.com/watch?v=A4MAWeLzybY
c) You can buy a very expensive planetary book scanner or rent one through a scanning service
d) You can buy one of small hand held scanners
http://www.irislink.com/EN-ROW/c968/IRI ... AlBK8P8HAQ
not sure how efficient or accurate they are
e) Just buy yourself a decent digital camera with a good zoom function - with a high file size setting I get a dpi of 600-700 which is more than sufficient for even small print and more than enough for normal 10 pt print in a modern book. Stick to using a diffuse light from the side rather than a flash and you can get really good images for your software to work from. Several memory cars will allow you to keep shooting while you put the other card into the laptop and download the images.

When 'scanning' large or precious books I usually photograph one page at a time and rest one side of the book up against a vertical surface (such as a library carrel dividing wall) so that it is open no more than 90 degrees. Photographing downwards gives you a flat page (so no distortion) and you can easily turn to the next page. Shoot the odd pages of the book first and then the odd with sequential numbering in the camera. Once in the software, AFR allows you to resort the pages from 1,3,5,7 and 2,4,6,8, into 1,2,3,4,5,6,7,8

User avatar
Der Alte Fritz
Member
Posts: 2171
Joined: 13 Dec 2007, 22:43
Location: Kent United Kingdom
Contact:

Re: Machine translation

#20

Post by Der Alte Fritz » 30 Sep 2016, 23:44

You will find that modern programmes such as AFR can handle almost all alphabets, the one real exception seems to be Fraktur but as I posted earlier, this can be done through the Abbyy website and their dedicated Fraktur project.

I have had no problems with Cyrillic from a whole variety of Russian books from C19th through Soviet (terrible paper) to modern typefaces.

Felix C
Member
Posts: 1201
Joined: 04 Jul 2007, 17:25
Location: Miami, Fl

Re: Machine translation

#21

Post by Felix C » 19 Oct 2017, 17:05

To add to this thread.
Have two books in cryllic which are very faint djvu scans. Do I need to convert the Djvu to Pdf to scan?

Sorry I would like to do it correctly in the first instance. If you have experience then kindly share.

BTW, the fraktur conversion with ABByy works great.

Take Care

This is a great thread
Last edited by Felix C on 19 Oct 2017, 18:52, edited 1 time in total.

User avatar
Der Alte Fritz
Member
Posts: 2171
Joined: 13 Dec 2007, 22:43
Location: Kent United Kingdom
Contact:

Re: Machine translation

#22

Post by Der Alte Fritz » 19 Oct 2017, 17:31

No if you have Abbyy Finereader 12 or 14 they will open Dejavu files direct from the file and will analyse and read document just fine. You can then save individual or groups of pages as images or as a pdf.

smetanin albert
Member
Posts: 4952
Joined: 15 Jun 2003, 19:08
Location: Russia
Contact:

Re: Machine translation

#23

Post by smetanin albert » 19 Oct 2017, 17:39

ABBY Finereader perfectly reads this format(Djvu)

Felix C
Member
Posts: 1201
Joined: 04 Jul 2007, 17:25
Location: Miami, Fl

Re: Machine translation

#24

Post by Felix C » 19 Oct 2017, 18:54

Thanks gentlemen
does anyone now how sites like militera.lib.ru are able to scan old books with faint text so well? what system do they use?

User avatar
Der Alte Fritz
Member
Posts: 2171
Joined: 13 Dec 2007, 22:43
Location: Kent United Kingdom
Contact:

Re: Machine translation

#25

Post by Der Alte Fritz » 19 Oct 2017, 21:46

Professional book scanners costs tens of thousands of pounds eg. http://unionovo.eu/ or http://www.genusit.com/products/imaging ... -scanners/ to get perfect lighting and then high quality digital cameras to collect large digital images and sophisticated software to erase distortions and correct errors.

However you could build your own https://www.diybookscanner.org/ as the cameras can be bought off the shelf and similar software can be supplied as open source.
Or you can buy a kit for $1,000 - $1,7000 https://store.diybookscanner.org/

Post Reply

Return to “Translation help: Breaking the Sound Barrier”