Curlin'

Recently I had cause to collect all Requests for Comment (RFCs) and Phrack articles - ever published - for mining purposes. Was a good quick little bash test, and improved my curlin'! The results are as follows:

Very straight forward snippet for grabbing the first 1000 rfcs:

URL_TEMP="http://www.networksorcery.com/enp/rfc/rfc"

for i in {1..1000}; do 

	#echo $i 

	FULL="$URL_TEMP$i.txt"

	curl $FULL > $(echo "rfc$i.txt")

done

This is a slighty improved version of the RFC snippet, which takes advantage of curl's f and o flags to use curl to only save non 404s:

#!/bin/bash

n=0

for i in {1..69}; do

	for x in {1..30}; do

		FILE="phrack i$i v$x.txt"

		# this curl produces empty 404 files:

		#curl -fs http://www.phrack.org/archives/issues/$i/$x.txt > "$FILE"

		# this one doesn't...

		curl -o "$FILE" -fs http://www.phrack.org/archives/issues/$i/$x.txt

		# with improved curl syntax, don't need 2 conds, but leaving for infos

		if [ ! -s "$FILE" ]; then

			echo "404! Skipping"

			#rm "$FILE" #curl -f (fail) shouldn't dl anything that 404d

		else

			echo "Article Found, Saving as #$n..."

			mv "$FILE" "phrack_$n.txt"

			let "n++"

		fi

	done

done

# Reads: 1575

Comments:

No Comments Yet!