Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

Sunday Afternoon Computer Question: How to Archive a Website?
Me

Posted on 06/16/2024 4:00:43 PM PDT by Paul R.

A relatively small web forum I am a member of is shutting down soon. There are a lot of posts on it I and other members would like to archive to HD for reference. Saving individual pages is VERY time consuming, and we only have until the end of the month. Do any of our FReeper computer guru's have any experience with this?


TOPICS: Computers/Internet; Reference
KEYWORDS: archive; internet; save; vanity; website
Navigation: use the links below to view more comments.
first 1-2021-31 next last
I attempted to save the site with a program ostensibly made for that purpose, HTTrack, but get a mirror error. So far I'm unable to find a cause for the error - not that I know much about such errors. We only have until the end of June to find a way to archive the site. There has been some discussion of doing just that on a thread on the site, but none of us have any experience with this.

So far, at least, the forum "master" has not raised any objection to saving of the material for personal use, but is also not assisting and may not have any time or ideas anyway.

Do any FReepers have any experience with something like this? Maybe try some other archival program?

Thanks in advance!

1 posted on 06/16/2024 4:00:43 PM PDT by Paul R.
[ Post Reply | Private Reply | View Replies]

To: Paul R.

Is this a database driven website?

We need more info.


2 posted on 06/16/2024 4:02:51 PM PDT by Jeff Chandler (THE ISSUE IS NEVER THE ISSUE. THE REVOLUTION IS THE ISSUE.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Paul R.

wget might work.

https://eternallybored.org/misc/wget/


3 posted on 06/16/2024 4:04:14 PM PDT by E. Pluribus Unum (The worst thing about censorship is █████ ██ ████ ████ ████ █ ███████ ████. FJB.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Paul R.

https://www.google.com/search?q=httrack+mirror+error&sca_esv=83e1b8e3588a8f5f&rlz=1C1AVFC_enUS779US780&ei=SW9vZvSrLp3GkPIPu_aroAE&ved=0ahUKEwi09PzeneGGAxUdI0QIHTv7ChQQ4dUDCBE&uact=5&oq=httrack+mirror+error&gs_lp=Egxnd3Mtd2l6LXNlcnAiFGh0dHJhY2sgbWlycm9yIGVycm9yMgsQABiABBiRAhiKBTIGEAAYFhgeMgsQABiABBiGAxiKBTILEAAYgAQYhgMYigUyCxAAGIAEGIYDGIoFMggQABiABBiiBDIIEAAYgAQYogQyCBAAGIAEGKIESIEeUPUHWLkccAF4AZABAJgBZaABxAiqAQQxMi4xuAEDyAEA-AEBmAIOoALdCcICChAAGLADGNYEGEfCAg0QABiABBiwAxhDGIoFwgIFEAAYgATCAgoQABiABBhDGIoFwgIIEAAYFhgeGA-YAwCIBgGQBgqSBwQxMS4zoAeNRg&sclient=gws-wiz-serp


4 posted on 06/16/2024 4:04:38 PM PDT by Jeff Chandler (THE ISSUE IS NEVER THE ISSUE. THE REVOLUTION IS THE ISSUE.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Paul R.

https://learn.microsoft.com/en-us/answers/questions/1167673/how-to-use-wget-command-on-windows-(for-recursive


5 posted on 06/16/2024 4:05:39 PM PDT by E. Pluribus Unum (The worst thing about censorship is █████ ██ ████ ████ ████ █ ███████ ████. FJB.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Paul R.

Archiving Websites with Wget

A second vote for wget. Good luck!


6 posted on 06/16/2024 4:06:50 PM PDT by so_real ( "The Congress of the United States recommends and approves the Holy Bible for use in all schools.")
[ Post Reply | Private Reply | To 1 | View Replies]

To: Paul R.

Paul,

I’d be happy to help. I’ve been doing website stuff for over 30 years.

The simplest solution is to just use SiteSucker. Easy to use, but make sure it’s configured right or it could end up saving webpages from other sites connected to your site.

Is it a WordPress based site, or some other kind of forum that consists of files and a database?

Let me know how I can help.


7 posted on 06/16/2024 4:19:54 PM PDT by Theo (FReeping since 1997 ... drain the swamp.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Theo

+1 for Sitesucker


8 posted on 06/16/2024 4:59:25 PM PDT by zeebee
[ Post Reply | Private Reply | To 7 | View Replies]

To: Jeff Chandler

I don’t think so. It’s just posts about fishing, fishing gear, etc. Plenty of pics.

It’s my go to source for info. and advice regarding non-Chinese gear, which I avoid when possible, any more.


9 posted on 06/16/2024 5:25:31 PM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 2 | View Replies]

To: Theo

This is not my site. But, again, the webmaster has not expressed any objection to saving of the pages of the site for personal use, and he HAS been involved in the thread about the shutdown. Multiple members have been saving individual pages with copy/paste, etc. But even for a small forum that’s not practical to save the whole thing or maintain any structure.

It’s ALMOST as if I heard FR was shutting down and I wanted to archive it, except that FR is MUCH, MUCH bigger. (Probably 10,000x if I had to guess.)

SFAIK it’s not a WordPress type site, but I am not sure. It IS a forum.


10 posted on 06/16/2024 5:32:01 PM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 7 | View Replies]

To: Paul R.; Theo

If it’s a forum, 99.9% chance is has a database for storing posts. Most any website scraper/down-loader will just grab the html that a browser gets from that and save it as html pages, plus images.

I’m on Linux/Ubuntu and I’ve used httrack for that but like someone mentioned about a different tool. You have to be careful what depth of links you grab. Might need 2-3 depending on how the site works. If the forum allows embedded youtube videos it could get big unless you figure out how to filter that out.

Like everything tech the answer is, it depends on some pesky thing like variables.

Plug the url into builtwith.com and you might get an idea what it’s ‘built with’. At any rate, you’re only going to be able to scrape the html pages that get rendered for the browser.


11 posted on 06/16/2024 5:46:13 PM PDT by Pollard (Will work for high tunnel money!)
[ Post Reply | Private Reply | To 10 | View Replies]

To: so_real

I guess I’d better add that I’m using Windows 10 or 11...


12 posted on 06/16/2024 5:46:41 PM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 6 | View Replies]

To: Paul R.

Go to the URL of your site and add “/wp-admin” to the end. For example, if your site is www.example.com, you would go to www.example.com/wp-admin. If you get a login page it is a wordpress site. If it is then there are wordpress plugins that do full backups.


13 posted on 06/16/2024 5:59:21 PM PDT by bankwalker (Repeal the 19th ...)
[ Post Reply | Private Reply | To 10 | View Replies]

To: Paul R.

One quick thing to check is how much if any of the site has already been archived by archive.org’s wayback machine.


14 posted on 06/16/2024 6:35:09 PM PDT by freeandfreezing
[ Post Reply | Private Reply | To 1 | View Replies]

To: bankwalker

Ok, I tried that (adding the “/wp-admin”) and got a “forbidden access” error message.

So I guess that confirms it’s not a wordpress site.


15 posted on 06/16/2024 6:43:58 PM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 13 | View Replies]

To: freeandfreezing

I found it, but it seems to be bits and pieces. I am NOT sure I’m using the archive.org site correctly. It seems almost like another planet.


16 posted on 06/16/2024 7:24:01 PM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 14 | View Replies]

To: Paul R.

Anyone considered offering to buy the site and content from him?


17 posted on 06/16/2024 7:34:54 PM PDT by PAR35
[ Post Reply | Private Reply | To 1 | View Replies]

To: so_real

Bkmk


18 posted on 06/16/2024 8:00:51 PM PDT by ptsal (Vote R.E.D. >>>Remove Every Democrat ***)
[ Post Reply | Private Reply | To 6 | View Replies]

To: Paul R.

What is the URL?


19 posted on 06/16/2024 8:21:17 PM PDT by Jeff Chandler (THE ISSUE IS NEVER THE ISSUE. THE REVOLUTION IS THE ISSUE.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: PAR35

A few of the members have been talking about that possibility, but, apparently it isn’t going anywhere. The site is one of ~60 forums owned by the same company and apparently the whole thing is going down shortly.


20 posted on 06/17/2024 1:36:06 AM PDT by Paul R. (Bin Laden wanted Obama killed so the incompetent VP, Biden, would become President!)
[ Post Reply | Private Reply | To 17 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021-31 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson