Tarhuntaš

DLI Books to DJVU

As I’m also one of those reading books from DLI, and not particularly liking to fetch one by one the pages in TIF format, I’ve been tinkering this script for about a year, and I think it’s fairly decent by now. It expects you to give links from the search results.

Perhaps it might be useful to someone else. Please do tell about how it fared, if you try it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#!/bin/bash
 
# test if anything was given to the script
if [[ $1 = "" ]]; then
	echo "You must give the URL to be processed."
	echo "Enter 'ocr' as second argument to use ocrodjvu."
	exit 1
fi
 
url=$1
 
# converts % codes
url=`echo "$url" | sed "s/%20/ /g"`
url=`echo "$url" | sed "s/%27/'/g"`
url=`echo "$url" | sed "s/%28/(/g"`
url=`echo "$url" | sed "s/%29/)/g"`
 
# gets the variables
 title=`echo "$url" | sed -r "s/.*title1=([^\&]*)\&.*/\1/g"`
author=`echo "$url" | sed -r "s/.*author1=([^\&]*)\&.*/\1/g"`
 pages=`echo "$url" | sed -r "s/.*pages=([^\&]*)\&.*/\1/g"`
  path=`echo "$url" | sed -r "s/.*url=([^\&]*)/\1/g"`
 
# shows them
echo ""
echo -e "Author:\t$author"
echo -e "Title:\t$title"
echo -e "Pages:\t$pages"
echo -e "Path:\t$path"
 
# assembles the filename
filename=`echo "$author" - "$title"`
 
# tests if the directory named $filename already exists,
# if not, it's created, then changes to its path
if [ -d "$filename" ]; then
	echo -n "The directory '"$filename"' already exists. "
else
	mkdir "$filename"
fi
 
cd "$filename"
 
# creates directories to hold the .tif and .djvu files
if [ ! -d tif  ]; then
	mkdir tif
fi
 
if [ ! -d djvu ]; then
	mkdir djvu
fi
 
cd tif
 
# if there is a 'last' file, makes the script continue
# from that; otherwise, starts from 1
if [ -f last ]; then
	firstpage=`cat last`
	echo "Resuming from page $firstpage..."
	firstpage=`echo "$firstpage+1" | bc`
else
	firstpage=1
fi
 
echo ""
tput sc
 
# iterates the download for each file
# the exact path is a hack that happens to work...
# it avoids downloading files again by checking the
# timestamp of each file, that is the "-N" option
for i in $(seq $firstpage $pages); do
 
	echo -n Page $(printf "%08d" $i)...
 
	if [[ $path == *data1* ]]; then
		wget -N -q --random-wait http://www.new1.dli.ernet.in/$path/PTIFF/$(printf "%08d" $i).tif
	elif [[ $path == *data2* ]]; then
		wget -N -q --random-wait http://www.new1.dli.ernet.in/$path/PTIFF/$(printf "%08d" $i).tif
	elif [[ $path == *data3* ]]; then
		wget -N -q --random-wait http://www.new1.dli.ernet.in/$path/PTIFF/$(printf "%08d" $i).tif
	else
		wget -N -q --random-wait http://www.new.dli.ernet.in/$path/PTIFF/$(printf "%08d" $i).tif
	fi
 
	if [ $? = 0 ]; then
		echo -n " done."
		echo "$(printf "%08d" $i)" > last
 
		# converts to djvu
		cjb2 $(printf "%08d" $i).tif ../djvu/$(printf "%08d" $i).djvu > /dev/null 2>&1
 
		tput el1
		tput rc
 
	else
		echo "error!"
	fi
 
done
 
cd ..
 
# assembles the djvu pages in one bundle
djvm -c ../"$filename".djvu djvu/*djvu
 
# ocr
 
if [ "$2" = "ocr" ]; then
	ocrodjvu -o "$filename (ocr)".djvu "$filename".djvu
fi

A Collatz Conjecture’s Bonsai

A Collatz Conjecture's Bonsai

I’ve recently made this, after seeing xkcd’s cartoon about the Collatz Conjecture. May be just me, but I rather like it. It was made iterating from 1 to 10000, and calculating the possible paths of each number. If it leads only to its double, then it’s blue; if it leads also to an odd number, red; the grey ones didn’t get iterated, but were calculated implicitly by doubling the previous ones. Et voilà!

BASH Miscellanea

How to read a $file line by $line:

1
cat $file | while read line; do echo $line; done

How to extract URLs from a file (from here): Grab this sed script, make it executable (chmod +x list_urls.sed), then:

1
cat * | ../list_urls.sed

How to count the number of files inside a $directory (from here):

1
ls -1 $directory | wc -l