Posted on 12/02/2021 3:36:29 PM PST by daniel1212
Sorry I am late to this thread, haven’t been online since earlier today, and at the moment only have my cell phone. So this is a quick note and I’ll try to write more later.
Shadowace is correct in that a combination of sed and awk applied in the context of a careful bash script, is probably your best tack.
Someone may have already mentioned this, but I will as well. You also wish to consider people who have made bookmarks using the mixed case URL syntax, meaning even if you get all the internal references lowercase, and the file name is lowercase, bookmarks will still fail. So I suggest that you record the names of the files in the mixed case format, and make a set of symbolic links that direct mixed case accesses to the lower case file names.
All for now, I’ll try to write back later with perhaps a sample script.
And just like that,
FreeRepublic turned into StackOverflow.
OK.
Thanks. Middle of ministry phone call now, so I will have to get back to this soon.
OK. A perl of great advise.
Well, let's give FR more chances!
Indeed. Been there at lot!
I’d use Power Shell. You can DuckDuckGo to find a script.
I’ve done similar things.
I am back now. What program are we talking about here to do this #pass 1.
ok then. I’ve been running linux for 15 years and when I type in a linux question into duckduckgo, 95% of the time, I will find it on the sources I mentioned. DuckDuckGo even has a little side blurb similar to a wikipedia side blurb for stackexchange.
Open File Explorer
hover your mouse over your folder
Shift+RightClick and select "Open PowerShell Window Here"
In your PowerShell script, remove the "-Recurse" so the script will only run in the current folder
I did ask on Multi Commander Support Forum but no replace yet. I am quite sure there is a regex script that will do this.
I see others have tried but I cannot understand it.
https://stackoverflow.com/questions/7746175/how-to-convert-regex-pattern-match-to-lowercase-for-url-standardization-tidying
"Re: Edit the PowerShell Script to only run the current directory. Open File Explorer hover your mouse over your folder Shift+RightClick and select "Open PowerShell Window Here" In your PowerShell script, remove the "-Recurse" so the script will only run in the current folder"
I just realized that this is for changing file names, which is easily done via software and I already mostly did, as said, and that I just want a way to change all the characters within a text that begin with http and end with html.
I would think Multi Commander would have a way to do this amidst its multitude of options and plug ins:
But studying this. http://multicommander.com/docs/FindAndReplaceInFiles it says "Finding text using regular expressions is not yet enabled."
I actually have been to stackexchange a lot to no avail. Not the operation I need, though this new find sounds like it:
find . -name "*.html" -exec perl -pi -e \ '$q=qr/"|\x39/; s{\b(src|href)=($q?.+$q?)\b}{$1=\L$2}gi;' {} +
But how to adapt and run it I know not.
I know the index page works, but I was asking about some of the linked pages, like http://www.peacebyjesus.net/Bible/Matthew_1.html
Checking https://www.deadlinkchecker.com/website-dead-link-checker.asp here shows of 2000/2000 URLs checked (the limit for this free tool), 1821 were OK, while 179 failed, which is after I changed most of my index page links to lower case, while the broken links are those with upper case links. But the mystery of why some links with upper case letters work when the file is lower case (or perfectly corresponds to the upper case) and some do not is still a mystery.
After extensive searching and suggestions but which are not what I need, or end up being faulted, plus many with a starting point being beyond a newbie, and being warned "You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML," and yet experimenting with some offers (which did far too much selecting), I found one Regex pattern that seems to work in Text Crawler free,
The source of that Regex pattern states,
We can construct a well-formed regular expression to match and extract the link values from the above text as follows:
href="http[s]?://.+?"
Our regular expression looks for strings that start with “href="http://” or “href="https://”, followed by one or more characters (.+?
), followed by another double quote. The question mark behind the [s]?
indicates to search for the string “http” followed by zero or one “s”.
The question mark added to the .+?
indicates that the match is to be done in a “non-greedy” fashion instead of a “greedy” fashion. A non-greedy match tries to find the smallest possible matching string and a greedy match tries to find the largest possible matching string.
So I had the Text Crawler change what it found Iin a separate folder) into lower case, which reported 383 files searched, 7154 matches changed (this includes matches already in all lower case) in 112 files. However, this did not include sub folders, which was done next, with 862 files searched and 1614 matches changed in 68 files. I used spacetornado Renamer to rename the file names themselves.
The next step is deleting all my present folders and files on the server and uploading the new ones, and then once again checking them for bad ones, all of which will take some time.
I also found software that does simply change links to lower case, though it is $20 to do more than one file at a time, but that is a rare find
Thanks to all for their suggestions.
I assume you want to make all urls lowercase? detecting urls is not as easy as it may seem, because if you want to capture really all possible urls then the regex get's very complicated. however, to search for most common url schemes, this regex should suffice
((http|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-]))
and as the replace string:
\L$1\E
I tried it in Text Crawler (see post up above) and it seemed to work fine (even with single parenthesis), though I did not use the replace string since Text Crawler offers that as a option. Thank God for help.
In further research:
For me, to convert upper case letters in links to lower case then
(http.*[A-Z].*\.[a-zA-Z]{2,4}) worked as a regex in Text Crawler free (though mine is an old ver. 3.0.3) to find 2,238 links in a large file of mine,
as does "http[s]?://.+?"
though (courtesy of Stefan from Grepwin ) the pattern of
((http|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])) - which includes the parenthesis - seems to find more (3,2006) and,
href="http[s]?://.+?" finds 3,0033 In Text Crawler.
I did not try the \L\1 to convert to lower case since that is an option offered in the program, and running the above says it 3,0033 were changed, though it only changed the case (I hope!).
I had also found HTML Tags Change To Uppercase or Lowercase Software that converts uppercase letters in links to lower case, though it is $20 if you want to convert more than one file at a time, and is very slow or can locks up with large files, but it seemed to work well to convert upper case letters in links to lower case in files that I used it for.
I do not know regex (imagine a world in which this was the written language!) but was looking to convert all the upper case caps in links to lower case, and I searched a lot trying to find out how to do this. And so thanks to all who help, whom I thank God for.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.