Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

Sunday Afternoon Computer Question: How to Archive a Website?
Me

Posted on 06/16/2024 4:00:43 PM PDT by Paul R.

A relatively small web forum I am a member of is shutting down soon. There are a lot of posts on it I and other members would like to archive to HD for reference. Saving individual pages is VERY time consuming, and we only have until the end of the month. Do any of our FReeper computer guru's have any experience with this?


TOPICS: Computers/Internet; Reference
KEYWORDS: archive; internet; save; vanity; website
Navigation: use the links below to view more comments.
first previous 1-2021-31 last
To: Paul R.

no ... I would think it means it is wordpress but the login screen is secured. A “not found” would indicate it is not a wordpress site. This is my best guess from afar.


21 posted on 06/17/2024 6:08:35 AM PDT by bankwalker (Repeal the 19th ...)
[ Post Reply | Private Reply | To 15 | View Replies]

To: Paul R.; All

The site admin says the site might be 10-20 GB. Seems too small an estimate, but then again it’s only images (pics) and text.

I tried saving a recent page with several embedded pics to pdf and Word, and both the pdf and doc files were ~ 3 MB. Embedded links appear to work (at least so long as the site is still up). Results both ways were a bit wanky (many blank pages created in Word, “chopped up” oddly in pdf form).


22 posted on 06/17/2024 8:09:01 AM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Paul R.

Paul,

I started downloading the site, and after successfully downloading a dozen or so pages, the server blocked my access with a “Code 401 (forbidden).”

The server admin / webmaster would have to tweak the server’s security settings to allow the site to be downloaded by something like SiteSucker.


23 posted on 06/17/2024 8:56:15 AM PDT by Theo (FReeping since 1997 ... drain the swamp.)
[ Post Reply | Private Reply | To 10 | View Replies]

To: Theo; Pollard
I started downloading the site, and after successfully downloading a dozen or so pages, the server blocked my access with a “Code 401 (forbidden).”

The server admin / webmaster would have to tweak the server’s security settings to allow the site to be downloaded by something like SiteSucker. Well, darn.

I guess I can try asking about that...

The Admin seems concerned about copyright issues / possible challenges from forum members, but all the members who have commented so far seem only concerned with keeping access to all the info. It'd all just be for personal use anyway, but, I may not get anywhere with the Admin... I can certainly poll the forum to see if anyone objects to downloading of their content.

Thanks for the info., in any event!

Pollard, your take? (Thanks!)

24 posted on 06/17/2024 7:05:03 PM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 23 | View Replies]

To: Paul R.

Strictly personal use with only textual content shared by you to other people in you’re own words? No biggie. IMHO

Bigger question might be how to keep the site going?


25 posted on 06/17/2024 8:23:11 PM PDT by Pollard (Will work for high tunnel money!)
[ Post Reply | Private Reply | To 24 | View Replies]

To: Pollard

A valid question. Or maybe just pass a thumb drive around to all the interested members so they have copies of the archived info.

Some members have suggested something like a Facebook group, but some, myself included, don’t “do” that form of social media.


26 posted on 06/18/2024 12:29:20 PM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 25 | View Replies]

To: Paul R.

The toll I tried gave me Error 403 denied so I guess he’s onto us. I would ask him what it would take to leave it up as an archive if nothing else. No new posts, just static.

Your alternative is to surf like a madman and copy everything you can before he shuts it down.

Has he given a reason for shutting it down?


27 posted on 06/18/2024 4:25:05 PM PDT by Pollard (Will work for high tunnel money!)
[ Post Reply | Private Reply | To 24 | View Replies]

To: Pollard

FT is one of something like 60 sites “the company” runs, and apparently they are shutting all of them down.


28 posted on 06/20/2024 12:52:01 AM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 27 | View Replies]

To: Pollard

...And traffic has been light recently. Late fall through early spring are usually “slow” anyway, unlike FR, where our “masters” in DC and elsewhere come up with new outrages daily, regardless of season.

A couple of the senior / premier members on FT have died off, and I don’t think the management has put much effort into keeping the site hopping, so to speak. In hindsight, I’d say someone should have put some effort into seeing what kept other sites busy, and tried to emulate that. Ditto for looking at what posts / vids on media like You Tube get lots of activity. And so on. But, that said, if all 60 sites are going down, then there are obviously “wider” problems.


29 posted on 06/20/2024 1:26:04 AM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 27 | View Replies]

To: Paul R.

“WaybacK Machine”/Archive.org has a lot of the pages but not all on every date...they archived the most pages in 2022:

https://web.archive.org/web/sitemap/https://www.fishingtalks.com/action-rods-21573/


30 posted on 06/20/2024 2:04:08 AM PDT by Drago
[ Post Reply | Private Reply | To 29 | View Replies]

To: Pollard; Theo

Hi, Pollard,

Well, I’ve been saving as much as I can as “Web Page, Complete” (file and folder), and so far have most of my own threads saved, plus some others. It is barely a dent, but other forum members are saving stuff too. One problem is that the downloads to SSD take several seconds, and I have to be careful because if I exit the page too soon to go to the next, Windows assumes I wanted to cancel the download, and does so. No error message, etc. I see it later if I go back over my downloads list. Bah!

I started a thread here on FR asking about saving as *.mhtml (single files), as the web pages download and save much faster, despite using up a bit more drive space, and in viewing them offline, at least for this (FT) website they seem fine. Basically I wanted to know if there are any disadvantages to mhtml saves. The info. I found online was a bit arcane and didn’t seem to indicate anything that would “mess up” re-reading the pages, although images and such are of course not saved individually. That’s not a problem for me.

However, my thread later just totally disappeared, so I assume a FR Mod didn’t like it. No explanation, no PM, nothing. The thread is just gone.

Anyway, perhaps you can weigh in on ‘Web Page, Complete” vs. saving as single *.mhtml files?

I’m copying this to Theo as well.

Thanks!


31 posted on 06/26/2024 6:18:59 PM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 27 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-2021-31 last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson