Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

Skip to comments.

How can I change all characters in link addresses (or only btwnhttp:// and .html) <b>within<b> all files in selected folder(s) into lower case? Maybe a batch file or regex string. I have utilities in Windows such utilities Multi Commander, GrepWin, etc.
Freerepublic.com ^ | Thu, 12/02/21 | daniel1212

Posted on 12/02/2021 3:36:29 PM PST by daniel1212

click here to read article


Navigation: use the links below to view more comments.
first previous 1-2021-4041-57 last
To: daniel1212; ShadowAce

Sorry I am late to this thread, haven’t been online since earlier today, and at the moment only have my cell phone. So this is a quick note and I’ll try to write more later.

Shadowace is correct in that a combination of sed and awk applied in the context of a careful bash script, is probably your best tack.

Someone may have already mentioned this, but I will as well. You also wish to consider people who have made bookmarks using the mixed case URL syntax, meaning even if you get all the internal references lowercase, and the file name is lowercase, bookmarks will still fail. So I suggest that you record the names of the files in the mixed case format, and make a set of symbolic links that direct mixed case accesses to the lower case file names.

All for now, I’ll try to write back later with perhaps a sample script.


41 posted on 12/02/2021 5:49:41 PM PST by dayglored ("Listen. Strange women lying in ponds distributing swords is no basis for a system of government.")
[ Post Reply | Private Reply | To 3 | View Replies]

To: daniel1212

And just like that,
FreeRepublic turned into StackOverflow.


42 posted on 12/02/2021 6:04:21 PM PST by Flick Lives (The future is a quiet world)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce
" The above is the command to change text to lower case."

OK.

43 posted on 12/02/2021 6:06:37 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 34 | View Replies]

To: BiglyCommentary
"Here’s your framework. You want to do this in two passes. First pass - change all the html file names to lower case. Second pass - Change all references to those files within the html files to lower case to match. You go to the root where the files exist. #pass 1 find . -name “*.html” -exec (file names to lower case rename script) {} \; #pass 2 find . -name “*.html” -exec (file edit to lower case script) {} \; So the () above just needs to be filled in. Any you always copy your root to a test area before you let the about rip"

Thanks. Middle of ministry phone call now, so I will have to get back to this soon.

44 posted on 12/02/2021 6:08:39 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 36 | View Replies]

To: Woodman
"Sorry was thinking to technical. What I was saying is a simple Perl script will take text and rewrite it all in lower case (or upper case). With a few more lines of code can be made to search within text and convert as well. PERL is a scripting language written mainly to manipulate files and text although it can do much more. I’m just a very basic user of PERL, not really a programmer. However all the languages I know have a basic to upper or to lower command. Some are easier to use than others. HTML is really just a text file and can be manipulated as such."

OK. A perl of great advise.

45 posted on 12/02/2021 6:10:59 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 37 | View Replies]

To: Pollard
"https://unix.stackexchange.com/ Someone there will answer and a bunch of other people will vote and then you take the highest voted answer. The answer will likely give you the regex/code you need"

Well, let's give FR more chances!

46 posted on 12/02/2021 6:12:49 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 38 | View Replies]

To: Flick Lives
"And just like that, FreeRepublic turned into StackOverflow."

Indeed. Been there at lot!

47 posted on 12/02/2021 6:15:06 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 42 | View Replies]

To: daniel1212

I’d use Power Shell. You can DuckDuckGo to find a script.

I’ve done similar things.


48 posted on 12/02/2021 6:16:30 PM PST by gitmo (If your theology doesn't become your biography, what good is it?)
[ Post Reply | Private Reply | To 47 | View Replies]

To: BiglyCommentary
"You go to the root where the files exist. #pass 1 find . -name “*.html” -exec (file names to lower case rename script) {} \; #pass 2 find . -name “*.html” -exec (file edit to lower case script) {} \; So the () above just needs to be filled in. Any you always copy your root to a test area before you let the about rip."

I am back now. What program are we talking about here to do this #pass 1.

49 posted on 12/02/2021 6:21:27 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 36 | View Replies]

To: daniel1212

ok then. I’ve been running linux for 15 years and when I type in a linux question into duckduckgo, 95% of the time, I will find it on the sources I mentioned. DuckDuckGo even has a little side blurb similar to a wikipedia side blurb for stackexchange.


50 posted on 12/02/2021 6:25:47 PM PST by Pollard (PureBlood -- youtube.com/watch?v=VXm0fkDituE)
[ Post Reply | Private Reply | To 46 | View Replies]

To: daniel1212
Re: Edit the PowerShell Script to only run the current directory.

Open File Explorer
hover your mouse over your folder
Shift+RightClick and select "Open PowerShell Window Here"
In your PowerShell script, remove the "-Recurse" so the script will only run in the current folder

51 posted on 12/02/2021 6:29:01 PM PST by T.B. Yoits
[ Post Reply | Private Reply | To 27 | View Replies]

To: Pollard
"You’re mostly asking people who can’t even do html as evidenced by three separate FR threads where someone learned to post an image or you’re dealing with people who used to be techies a decade or two ago or are in some obscure tech field."

I did ask on Multi Commander Support Forum but no replace yet. I am quite sure there is a regex script that will do this.

I see others have tried but I cannot understand it.

https://stackoverflow.com/questions/7746175/how-to-convert-regex-pattern-match-to-lowercase-for-url-standardization-tidying

52 posted on 12/02/2021 6:42:10 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 38 | View Replies]

To: T.B. Yoits
gci -Recurse | ? { $_.Name -cne $_.Name.ToLower() } | % { ren $_.Name -NewName $_.Name.Tolower() }

"Re: Edit the PowerShell Script to only run the current directory. Open File Explorer hover your mouse over your folder Shift+RightClick and select "Open PowerShell Window Here" In your PowerShell script, remove the "-Recurse" so the script will only run in the current folder"

I just realized that this is for changing file names, which is easily done via software and I already mostly did, as said, and that I just want a way to change all the characters within a text that begin with http and end with html.

I would think Multi Commander would have a way to do this amidst its multitude of options and plug ins:

Mult iCommander But studying this. http://multicommander.com/docs/FindAndReplaceInFiles it says "Finding text using regular expressions is not yet enabled."

53 posted on 12/02/2021 7:57:14 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 51 | View Replies]

To: Pollard
"ok then. I’ve been running linux for 15 years and when I type in a linux question into duckduckgo, 95% of the time, I will find it on the sources I mentioned. DuckDuckGo even has a little side blurb similar to a wikipedia side blurb for stackexchange."

I actually have been to stackexchange a lot to no avail. Not the operation I need, though this new find sounds like it:

find . -name "*.html" -exec perl -pi -e \
    '$q=qr/"|\x39/; s{\b(src|href)=($q?.+$q?)\b}{$1=\L$2}gi;' {} +

But how to adapt and run it I know not.

54 posted on 12/02/2021 8:05:23 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 50 | View Replies]

To: Repeal The 17th; Pollard; Woodman; T.B. Yoits; bhl; ShadowAce; BiglyCommentary
"Works just fine..."

I know the index page works, but I was asking about some of the linked pages, like http://www.peacebyjesus.net/Bible/Matthew_1.html

Checking https://www.deadlinkchecker.com/website-dead-link-checker.asp here shows of 2000/2000 URLs checked (the limit for this free tool), 1821 were OK, while 179 failed, which is after I changed most of my index page links to lower case, while the broken links are those with upper case links. But the mystery of why some links with upper case letters work when the file is lower case (or perfectly corresponds to the upper case) and some do not is still a mystery.

After extensive searching and suggestions but which are not what I need, or end up being faulted, plus many with a starting point being beyond a newbie, and being warned "You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML," and yet experimenting with some offers (which did far too much selecting), I found one Regex pattern that seems to work in Text Crawler free,

digitalvolcano.co.uk textcrawler.html

The source of that Regex pattern states,

We can construct a well-formed regular expression to match and extract the link values from the above text as follows:

href="http[s]?://.+?"

Our regular expression looks for strings that start with “href="http://” or “href="https://”, followed by one or more characters (.+?), followed by another double quote. The question mark behind the [s]? indicates to search for the string “http” followed by zero or one “s”.

The question mark added to the .+? indicates that the match is to be done in a “non-greedy” fashion instead of a “greedy” fashion. A non-greedy match tries to find the smallest possible matching string and a greedy match tries to find the largest possible matching string.

So I had the Text Crawler change what it found Iin a separate folder) into lower case, which reported 383 files searched, 7154 matches changed (this includes matches already in all lower case) in 112 files. However, this did not include sub folders, which was done next, with 862 files searched and 1614 matches changed in 68 files. I used spacetornado Renamer to rename the file names themselves.

The next step is deleting all my present folders and files on the server and uploading the new ones, and then once again checking them for bad ones, all of which will take some time.

I also found software that does simply change links to lower case, though it is $20 to do more than one file at a time, but that is a rare find

Thanks to all for their suggestions.

55 posted on 12/03/2021 8:53:47 AM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 30 | View Replies]

To: daniel1212
For the record, I had emailed the programmer of Grepwin on this, and he courteously replied that,
I assume you want to make all urls lowercase? detecting urls is not as easy as it may seem, because if you want to capture really all possible urls then the regex get's very complicated. however, to search for most common url schemes, this regex should suffice
((http|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-]))
and as the replace string:
\L$1\E

I tried it in Text Crawler (see post up above) and it seemed to work fine (even with single parenthesis), though I did not use the replace string since Text Crawler offers that as a option. Thank God for help.

56 posted on 12/05/2021 9:22:22 PM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 55 | View Replies]

To: daniel1212
In further research:

For me, to convert upper case letters in links to lower case then

(http.*[A-Z].*\.[a-zA-Z]{2,4}) worked as a regex in Text Crawler free (though mine is an old ver. 3.0.3) to find 2,238 links in a large file of mine,

as does "http[s]?://.+?"

though (courtesy of Stefan from Grepwin ) the pattern of

((http|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])) - which includes the parenthesis - seems to find more (3,2006) and,

href="http[s]?://.+?" finds 3,0033 In Text Crawler.

I did not try the \L\1 to convert to lower case since that is an option offered in the program, and running the above says it 3,0033 were changed, though it only changed the case (I hope!).

Copyright © 2021 DigitalVolcano Software Ltd

I had also found HTML Tags Change To Uppercase or Lowercase Software that converts uppercase letters in links to lower case, though it is $20 if you want to convert more than one file at a time, and is very slow or can locks up with large files, but it seemed to work well to convert upper case letters in links to lower case in files that I used it for.

I do not know regex (imagine a world in which this was the written language!) but was looking to convert all the upper case caps in links to lower case, and I searched a lot trying to find out how to do this. And so thanks to all who help, whom I thank God for.

57 posted on 12/06/2021 8:06:33 AM PST by daniel1212 ( Turn to the Lord Jesus as a damned+destitute sinner, trust Him to save + be baptized + follow Him!)
[ Post Reply | Private Reply | To 56 | View Replies]


Navigation: use the links below to view more comments.
first previous 1-2021-4041-57 last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
Bloggers & Personal
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson