Regex golf with calibre

December 20th, 2014

I noticed one of my ebooks is a bit odd. It’s potentially a DRM issue as I’m converting it through formats to read on an old (and so far moderately indestructible) nook. Seems that speech marks and apostrophes have been converted to question marks, so you end up with:

?Have you seen the screwdriver?? ?Didn?t I already give you it? Sure it?s not with you?? ?It?s all right, I?m an idiot, it?s right here!?

Gets annoying pretty quickly, even with better dialogue, so regex to the rescue.

First the apostrophes. Now we need to just match question marks in the _middle_ of words, not at the end, so that’s nice and easy;

# Search:
# Replace:

So, \w we find a letter (or number, or hyphen, but not a space) followed by a question mark (escaped with the backslash because it has special meaning in regex world), followed by another letter (number…. not a space). We remember the letters in two separate match groups by sticking them in brackets.
We then replace the three characters with the original first letter we found (\1), overwrite the question mark with an apostrophe and then put the second letter back (\2). Boom -straight away I?s, I?m and what?s go back to being readable.

The next bit is more complicated, but thanks to me using calibre to convert it in the first place, it’s littered with calibre’s mad class formatting separating every paragraph or newline. It’s also badly documented as it’s 4am and I really should have gone to bed. Also, I realise this fails in a lot of cases, so part 2 to follow.

<span class="calibre6">?How long is it??</span>
Find: ()\?(.+?)\?() Replace:
12 Days of GSM Christmas

December 16th, 2014

On the first day of Christmas,
I cloned OpenBSC
Need an MNC and MCC.

On the second day of Christmas,
compiling Osmocom-BB
Need legit spectrum usage,
And an MNC, MCC.

On the third day of Christmas,
can’t start OpenBSC
Reading 3GPP standards,
Still need some spectrum,
And an MNC, MCC.

On the fourth day of Christmas,
recursive dependencies
BSC controlling,
3GPP hurting,
Haven’t got a license,
Still need an MNC, MCC.

On the fifth day of Christmas,
A radio just for me
BTS Transmitting,
BSC Controlling,
3GPP Headaches,
Dialledback the power,
And an MNC, MCC.

On the sixth day of Christmas,
Runs almost stably
An MSC switched for me!
BTS transmitting,
BSC controlling,
3GPP madness,
Hidden in a cupboard,
And an MNC, MCC.

On the seventh day of Christmas,
Sectets broken me
Seven bit encoding.
MSC for switching,
BTS Transmitting,
BSC Controlling,
Real attenuation,
And an MNC, MCC.

On the eighth day of Christmas,
battling radio frequencies
Eight are the timeslots,
Seven bit encoding,
MSC’s a-switching,
BTS transmitting,
BSC Controlling,
Two AR-FCNs,
And an MNC and MCC.

On the ninth day of Christmas,
No more ISDN for me;
A-BIS over IP,
Eight are the timeslots,
Seven bit encoding,
MSC’s a-switching,
BTS transmitting,
BSC controlling,
Two cells a-serving,
And an MNC & MCC.

On the tenth day of Christmas,
I need some sanity
Logging T-IMSI not the imsi IMSI
Um needs it’s A-BIS,
14 half-rate timeslots,
Seven bit encoding,
MSCs a-switching,
BTS unlocked!
BSC controls,
Handover fails,
And need an MNC & MCC.

On the eleventh day of Christmas,
Time for secur-ity
A5/1 encryption is key.
TIMSI not an IMSI,
ABIS over IP,
AMR just fails,
Seven bits for texting,
MSC is switching,
BTS transmitting,
BSC controlling,
30-gig of specs,
PCS is 1900,
And an MNC & MCC.

On the twelfth day of Christmas,
Need interoperability
SS7 MAP for me,
A5/3 encryption,
TIMSI not an IMSI,
ABIS over IP,
7 timeslots free,
PDUs encoded,
MSC’s a-switching,
MS won’t you answer?
BSC online,
3GPP – explains it to me,
Two serving cells,
And an MNC & MCC!


I suppose I’m left trying to implement data for lent.