BA / MA digital files

Discussions on archives and similar issues. Hosted by Jeff Leach.
Melax
Member
Posts: 384
Joined: 22 Aug 2020 18:00
Location: Germany

Re: BA / MA digital files

Post by Melax » 16 Feb 2024 09:59

Does anybody know how often you can ask for a scan-on-demand request in Freiburg? 5 times everytime your order was fullfilled? 5 times per year/quartal/month?

User avatar
TH
Member
Posts: 769
Joined: 08 Mar 2007 23:34
Location: Germany

Re: BA / MA digital files

Post by TH » 16 Feb 2024 12:09

Melax wrote:
16 Feb 2024 09:59
5 times everytime your order was fullfilled?
I don't know, but that's what I assume.

User avatar
TH
Member
Posts: 769
Joined: 08 Mar 2007 23:34
Location: Germany

Re: BA / MA digital files

Post by TH » 23 Feb 2024 00:45

To anyone with an X account: Has the Bundesarchiv mentioned anything about Invenio's file viewer being broken since Tuesday?

diciassette2000
Member
Posts: 91
Joined: 29 Aug 2003 20:05
Location: Switzerland

Re: BA / MA digital files

Post by diciassette2000 » 23 Feb 2024 08:27

I noticed too but I haven't read anything about it....I don't know what they are doing???? Hm...
All the best
Maurizio

Boby
Member
Posts: 2762
Joined: 19 Nov 2004 17:22
Location: Spain

Re: BA / MA digital files

Post by Boby » 02 Mar 2024 01:03

Why scanning even the reverse of documents when there is nothing there?? 40-50% of files are just useless. 8O

User avatar
Christian Ankerstjerne
Forum Staff
Posts: 14043
Joined: 10 Mar 2002 14:07
Location: Denmark

Re: BA / MA digital files

Post by Christian Ankerstjerne » 02 Mar 2024 18:17

Boby wrote:
02 Mar 2024 01:03
Why scanning even the reverse of documents when there is nothing there?? 40-50% of files are just useless. 8O
They aren't useless. In many cases they do not offer much value to historians, but they may still provide value to archeologists. Furthermore, the reverse sometimes presents interesting information:
  • The use of the back of old maps to conserve paper, illustrating the paper shortage of the Third Reich.
  • Watermarks are sometimes visible, especially when adjusting brightness and contrast.
  • The crossed-out parts of typed documents can sometimes be more easily read by looking at the back.
Rather than carefully evaluating the value of the reverse of each piece of paper it is better and more efficient to scan all of them.

User avatar
Eax-E
Member
Posts: 865
Joined: 08 Jun 2010 17:58

Re: BA / MA digital files

Post by Eax-E » 02 Mar 2024 22:06

Boby wrote:
02 Mar 2024 01:03
Why scanning even the reverse of documents when there is nothing there?? 40-50% of files are just useless. 8O
It is a regular archive method. It proves that the back of the paper was not forgotten in the digitalization process. Then we know there is no missing information.

Kr

Boby
Member
Posts: 2762
Joined: 19 Nov 2004 17:22
Location: Spain

Re: BA / MA digital files

Post by Boby » 03 Mar 2024 15:24

Eax-E wrote:
02 Mar 2024 22:06
Boby wrote:
02 Mar 2024 01:03
Why scanning even the reverse of documents when there is nothing there?? 40-50% of files are just useless. 8O
It is a regular archive method. It proves that the back of the paper was not forgotten in the digitalization process. Then we know there is no missing information.

Kr
By who? I know the NARA microfilms or TsAMO f. 500 digital collection and it was not used. Same with UK Discovery TNA files. Ditto from spanish documents at AGA and AGS.

Boby,

Sean Oliver
Member
Posts: 177
Joined: 14 Sep 2007 18:18
Location: Wisconsin USA

Re: BA / MA digital files

Post by Sean Oliver » 06 Mar 2024 05:27

The problem with including scans of blank pages is that it doubles both the file size and download time for no good reason. Furthermore, if the user intends to OCR the text, each blank page must be deleted manually before OCR is done. This is a tedious and frustrating chore when files consist of many hundreds of pages but only half of them actually contain text/information. So far as I know, BAMA is the only online archive to scan blank pages and upload them.
Is this really necessary simply to prove to users that all pages have been scanned?
In fact, BAMA does not include all pages of many files. For example, the RH 10 Gen Insp Pz Truppen formation files contain the all-important monthly unit organization charts which are often larger than A4. Nonetheless, BAMA decided not to bother with scanning these sheets and including them with their online files, because scanning large format sheets was apparently too much trouble. They inserted a sheet of paper explaining that these will be scanned and added later. It seems that over the past 2-3 years however, BAMA has since forgotten all about scanning these sheets.
But at least we can be reassured that every one of the thousands of normal sized pages with absolutely nothing on them have been scanned perfectly. :roll:

User avatar
Christian Ankerstjerne
Forum Staff
Posts: 14043
Joined: 10 Mar 2002 14:07
Location: Denmark

Re: BA / MA digital files

Post by Christian Ankerstjerne » 06 Mar 2024 06:50

Sean Oliver wrote:
06 Mar 2024 05:27
The problem with including scans of blank pages is that it doubles both the file size and download time for no good reason. Furthermore, if the user intends to OCR the text, each blank page must be deleted manually before OCR is done. This is a tedious and frustrating chore when files consist of many hundreds of pages but only half of them actually contain text/information. So far as I know, BAMA is the only online archive to scan blank pages and upload them.
Is this really necessary simply to prove to users that all pages have been scanned?
In fact, BAMA does not include all pages of many files. For example, the RH 10 Gen Insp Pz Truppen formation files contain the all-important monthly unit organization charts which are often larger than A4. Nonetheless, BAMA decided not to bother with scanning these sheets and including them with their online files, because scanning large format sheets was apparently too much trouble. They inserted a sheet of paper explaining that these will be scanned and added later. It seems that over the past 2-3 years however, BAMA has since forgotten all about scanning these sheets.
But at least we can be reassured that every one of the thousands of normal sized pages with absolutely nothing on them have been scanned perfectly. :roll:
There are other reasons for scanning the backs than to simply prove that everything has been scanned (see my previous reply). If the staff had to assess each page individually the digitalization process would take much longer. It's much more efficient to simply scan everything.

Besides, except for those backs that have maps printed on them or those which are otherwise patterned, the JPEG compression algorithm means that the blank pages should not take up nearly as much harddrive space as the front.

Sean Oliver
Member
Posts: 177
Joined: 14 Sep 2007 18:18
Location: Wisconsin USA

Re: BA / MA digital files

Post by Sean Oliver » 06 Mar 2024 10:06

Christian Ankerstjerne wrote:
06 Mar 2024 06:50
Sean Oliver wrote:
06 Mar 2024 05:27
The problem with including scans of blank pages is that it doubles both the file size and download time for no good reason. Furthermore, if the user intends to OCR the text, each blank page must be deleted manually before OCR is done. This is a tedious and frustrating chore when files consist of many hundreds of pages but only half of them actually contain text/information. So far as I know, BAMA is the only online archive to scan blank pages and upload them.
Is this really necessary simply to prove to users that all pages have been scanned?
In fact, BAMA does not include all pages of many files. For example, the RH 10 Gen Insp Pz Truppen formation files contain the all-important monthly unit organization charts which are often larger than A4. Nonetheless, BAMA decided not to bother with scanning these sheets and including them with their online files, because scanning large format sheets was apparently too much trouble. They inserted a sheet of paper explaining that these will be scanned and added later. It seems that over the past 2-3 years however, BAMA has since forgotten all about scanning these sheets.
But at least we can be reassured that every one of the thousands of normal sized pages with absolutely nothing on them have been scanned perfectly. :roll:
There are other reasons for scanning the backs than to simply prove that everything has been scanned (see my previous reply). If the staff had to assess each page individually the digitalization process would take much longer. It's much more efficient to simply scan everything.

Besides, except for those backs that have maps printed on them or those which are otherwise patterned, the JPEG compression algorithm means that the blank pages should not take up nearly as much harddrive space as the front.
It might seem that way, yet when each JPEG file size is examined, the blank sides are just as large as the text side.
It appears that most of the BAMA files have pages which are bound in book form, so that when pages are photographed with an overhead camera (not scanned) there are 2 pages facing the camera - the text on the right side, and the blank side of the previous page on the left. This is why both are photographed. Document processing software then automatically splits the open book image into 2 separate pages, one blank and the other with text. Then the page is turned, and the next 2 pages are photographed, etc. It is too much trouble apparently for BAMA to set their document software to delete the blank pages.

User avatar
Piet Duits
Member
Posts: 855
Joined: 18 Apr 2002 21:07
Location: Oudenbosch, Netherlands

Re: BA / MA digital files

Post by Piet Duits » 06 Mar 2024 10:21

yeah, so we have to get used to it.

User avatar
Christian Ankerstjerne
Forum Staff
Posts: 14043
Joined: 10 Mar 2002 14:07
Location: Denmark

Re: BA / MA digital files

Post by Christian Ankerstjerne » 06 Mar 2024 16:31

Sean Oliver wrote:
06 Mar 2024 10:06
It might seem that way, yet when each JPEG file size is examined, the blank sides are just as large as the text side.
It appears that most of the BAMA files have pages which are bound in book form, so that when pages are photographed with an overhead camera (not scanned) there are 2 pages facing the camera - the text on the right side, and the blank side of the previous page on the left. This is why both are photographed. Document processing software then automatically splits the open book image into 2 separate pages, one blank and the other with text. Then the page is turned, and the next 2 pages are photographed, etc. It is too much trouble apparently for BAMA to set their document software to delete the blank pages.
I don't know their exact process but I don't see how it could be automated. Some documents have text on both sides and, as described before, some of the backs of the documents may have historical value even if they contain no text.

MarkN
Member
Posts: 2631
Joined: 12 Jan 2015 13:34
Location: On the continent

Re: BA / MA digital files

Post by MarkN » 06 Mar 2024 17:38

Sean Oliver wrote:
06 Mar 2024 05:27
The problem with including scans of blank pages is that it doubles both the file size and download time for no good reason. Furthermore, if the user intends to OCR the text, each blank page must be deleted manually before OCR is done. This is a
tedious and frustrating chore
when files consist of many hundreds of pages but only half of them actually contain text/information. So far as I know, BAMA is the only online archive to scan blank pages and upload them.
Anybody can avoid that tedious and frustrating chore by visiting BAMA themself and copying only the pages they want. If anybody thinks that is also too tedious and too frustrating a chore to do themself, thay can contact a (local) researcher to do it for them.

Sigh!

An archive goes to the bother of making high quality scans of documents available to the public free of charge - and yet there are some who think they are entitled to even more.

User avatar
Piet Duits
Member
Posts: 855
Joined: 18 Apr 2002 21:07
Location: Oudenbosch, Netherlands

Re: BA / MA digital files

Post by Piet Duits » 06 Mar 2024 17:46

I have the same opinion as you Mark.

I have removed thousands of blank pages already, and I do that with a smile from ear to ear on my face, knowing that the other pages contain a shitload of valuable information. So, remove I will.

Return to “Archives”