Pamyat Naroda indexes. 3,400,000+ records processed

Discussions on archives and similar issues. Hosted by John Calvin and Jeff Leach.
User avatar
AMVAS
Member
Posts: 520
Joined: 02 Aug 2004 13:58
Location: Moscow

Pamyat Naroda indexes. 3,400,000+ records processed

Post by AMVAS » 09 Jul 2016 22:38

Hi

I finished processing catalogues of TsAMO archival operative records stored at
https://pamyat-naroda.ru/ site
Processed 3.430.807 records
They are in excel 2007 format divided into 4 parts

http://rusfolder.com/45145567
http://rusfolder.com/45145566
http://rusfolder.com/45145568
http://rusfolder.com/45145569

All texts are in Russian. Field names are given as it is.
There are no direct links, but using site search engine one can easily find any document.
Hope you have no problems with download.
Later I can mirrow those files on my site.

P.S. don't download any download managers from their site!
Download routine is the next:
1. Check here
Image
2. press left button "скачать"
3. Click to any adlink
4. wait for~30 secs
5. input captcha
6. download

Regards
Alex

User avatar
G. Trifkovic
Forum Staff
Posts: 2201
Joined: 06 Nov 2004 19:26
Location: The South-East

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by G. Trifkovic » 30 Jul 2016 10:43

Hi AMVAS,

and thanks for the index.

Best,

G.

User avatar
Jeff Leach
Host - Archive section
Posts: 1250
Joined: 19 Jan 2010 09:08
Location: Stockholm, Sweden

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by Jeff Leach » 30 Jul 2016 14:36

Yes, thanks for the indexes. Once you figure out how to filter the results they are quite usefull.

User avatar
AMVAS
Member
Posts: 520
Joined: 02 Aug 2004 13:58
Location: Moscow

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by AMVAS » 30 Jul 2016 19:42

Jeff Leach wrote:Yes, thanks for the indexes. Once you figure out how to filter the results they are quite usefull.
Moreover, Jeff, there exists quite an easy way to download entire dossiers using direct links and download manager :)
But right now I'm not ready to share this way :roll:

Eugen Pinak
Member
Posts: 889
Joined: 16 Jun 2004 16:09
Location: Kyiv, Ukraine

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by Eugen Pinak » 31 Jul 2016 13:13

AMVAS - thank very much for your work :thumbsup:
AMVAS wrote:
Jeff Leach wrote:Yes, thanks for the indexes. Once you figure out how to filter the results they are quite usefull.
Moreover, Jeff, there exists quite an easy way to download entire dossiers using direct links and download manager :)
But right now I'm not ready to share this way :roll:
Ha! Too late for me any way, as I've downloaded all I want one by one and even with that weird "hacking" to get pages above the 10th. But maybe you'му also found a way to mass-download files from germandocs... :wink:

User avatar
AMVAS
Member
Posts: 520
Joined: 02 Aug 2004 13:58
Location: Moscow

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by AMVAS » 31 Jul 2016 13:55

Eugen Pinak wrote:AMVAS - thank very much for your work :thumbsup:
AMVAS wrote:
Ha! Too late for me any way, as I've downloaded all I want one by one and even with that weird "hacking" to get pages above the 10th. But maybe you'му also found a way to mass-download files from germandocs... :wink:
I have doubts you downloaded 7-10Tb of their content :D
Now they finally fixed that bug with 10+ pages. ~On May 9th they introduced new search page. A bit better than the old one, but still not as powerful as we would like to have.

Look at rutracker.org for germandocs documents. They have plenty of them.
It's not my subject, so I didn't study opportunity to get copies en mass from that site.
As I can see, they use much simpler engine than pamyatnaroda with direct links like
http://wwii.germandocsinrussia.org/pages/112204/zooms/8 for full-scale pages.
So, I don't think downloading documents from them to be too much problem

Regards
Alex

Eugen Pinak
Member
Posts: 889
Joined: 16 Jun 2004 16:09
Location: Kyiv, Ukraine

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by Eugen Pinak » 31 Jul 2016 14:32

AMVAS wrote:I have doubts you downloaded 7-10Tb of their content :D
Certainly not, but I've decided for myself, that I shall download only data, relevant to my interests. Therefore I've limited myself to downloading various OOB and TOE data and not venturing any further.
AMVAS wrote:Look at rutracker.org for germandocs documents. They have plenty of them.
"Semion Semionovich..." (c) :oops:
AMVAS wrote:As I can see, they use much simpler engine than pamyatnaroda with direct links like
http://wwii.germandocsinrussia.org/pages/112204/zooms/8 for full-scale pages.
So, I don't think downloading documents from them to be too much problem
Indeed, getting direct link to the file is relatively easy - but the filenames are in random, so re-numbering them in proper order will take more time, than download them one by one :( Of course, maybe somebody already solved this, but I have no idea, how to do it :(

User avatar
AMVAS
Member
Posts: 520
Joined: 02 Aug 2004 13:58
Location: Moscow

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by AMVAS » 31 Jul 2016 20:40

Eugen Pinak wrote:
AMVAS wrote:I have doubts you downloaded 7-10Tb of their content :D
Certainly not, but I've decided for myself, that I shall download only data, relevant to my interests. Therefore I've limited myself to downloading various OOB and TOE data and not venturing any further.
Aha, I see...

"Semion Semionovich..." (c)
:oops:
Indeed, getting direct link to the file is relatively easy - but the filenames are in random, so re-numbering them in proper order will take more time, than download them one by one :( Of course, maybe somebody already solved this, but I have no idea, how to do it :(
One need to collect links to pages for every dossier and only then collect that pages into dossier folders. Not a work for manual downloading

User avatar
AMVAS
Member
Posts: 520
Joined: 02 Aug 2004 13:58
Location: Moscow

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by AMVAS » 31 Jul 2016 20:42

P.S. Maybe for this site will work some offline browser

Eugen Pinak
Member
Posts: 889
Joined: 16 Jun 2004 16:09
Location: Kyiv, Ukraine

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by Eugen Pinak » 31 Jul 2016 21:21

AMVAS wrote:P.S. Maybe for this site will work some offline browser
May be. But any way navsource from Rutracker already uploaded all the folders from germandocsinrussia.org - bless him! :)

User avatar
AMVAS
Member
Posts: 520
Joined: 02 Aug 2004 13:58
Location: Moscow

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by AMVAS » 31 Jul 2016 21:32

Yep ))

Mori
Member
Posts: 750
Joined: 25 Oct 2014 11:04
Location: Europe

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by Mori » 02 Aug 2016 11:03

Eugen Pinak wrote:But maybe you'му also found a way to mass-download files from germandocs... :wink:
That I've got, and all is replicated local now. (Happy to share, as usual).

When I checked rutracker some months ago, there were some very convenient torrents to pdf compilations (that John Calvin also copied to his FTP). But I am not sure whether the updates published since were processed too. Anyway, it was not too difficult to find an automated way to download the lot from the site itself.

User avatar
AMVAS
Member
Posts: 520
Joined: 02 Aug 2004 13:58
Location: Moscow

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by AMVAS » 02 Aug 2016 15:46

Mori wrote:
Eugen Pinak wrote:But maybe you'му also found a way to mass-download files from germandocs... :wink:
That I've got, and all is replicated local now. (Happy to share, as usual).

When I checked rutracker some months ago, there were some very convenient torrents to pdf compilations (that John Calvin also copied to his FTP). But I am not sure whether the updates published since were processed too. Anyway, it was not too difficult to find an automated way to download the lot from the site itself.
I approached for downloading full dossiers for about a year. Had no time to do this earlier.
The major problem is even not downloading, but to make a logical structure of those records.
their main unit is document. But they don't assign those documents to dossiers in user available atributation.
Rutracker files are good enough, but they have the same disadvantage - poor structure. And without structure it's useless load.
Right now I obtain everything I need - software for indexation, analysis and downloading what I need :D

I got some surpries. For example there exists maps, which are not indexed by official search engine! One can get access to them only through direct links (which ordinary user has no of course!)
So, those maps are invisible!

User avatar
Jeff Leach
Host - Archive section
Posts: 1250
Joined: 19 Jan 2010 09:08
Location: Stockholm, Sweden

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by Jeff Leach » 02 Aug 2016 18:39

I've downloaded about 2000 pages, one page as at a time. It wasn't a complete waste of time because I was forced looked at each document, allowing some intial evaluation of them.

After some trial and error, I settled on naming folders

fXXX opXXX dXXX.

It is the documents from the 'd = delo' that are collected in each folder. The documents in each folder had to be given new numbers to make sure the pages of each document were keep together.

The biggest problem with the search engine is that it will only display 100 documents at a time. Mass downloading isn't an issue for me. I have about 220,000 pages of wartime German documents and after 5 five year, I have only looked at about 10% and read maybe 2 - 3 %.

Sorry been rambling. What I wanted to say is I hope AMVAS will share some the hidden files. Any dealing with South or Southwest Front 22 June 1941 - 30 October 1941 would be really appreciated.

Mori
Member
Posts: 750
Joined: 25 Oct 2014 11:04
Location: Europe

Re: Pamyat Naroda indexes. 3,400,000+ records processed

Post by Mori » 02 Aug 2016 20:44

AMVAS wrote: The major problem is even not downloading, but to make a logical structure of those records.
their main unit is document. But they don't assign those documents to dossiers in user available atributation.
I took the easy way out: I store files under a folder named by the Number and the Title of the document. It takes an extra copy/paste of said title, as well as one manual sequence of 4 clicks per document (not per page). In the end, that's not 100% automatic, as there are ca. 8 clicks per document, but it has the valuable benefit of ease of identification & navigation.

Also, my method does not create any problem sequencing the pages, even within a folder.

Return to “Archives”