How to Change String Encoding from KOI8-R to UTF8 in Objective-C

KOI8-R is an 8-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet. I was getting a fileName from IMAP in this encoding and wanted to convert it to normal readable UTF-8.

My question concerning this issue in stackoverflow was banned. I had to solve this myself and spent quite a bit of time.

There is one solution, that works for many encodings – to use CFString methods:

- (NSString*)decodeKOI8R:(NSString*)stringToDecode {
    CFStringRef aCFString = CFStringCreateWithCString(NULL, [stringToDecode UTF8String], kCFStringEncodingKOI8_R);
    NSString *decodedString = (__bridge NSString*)aCFString;
    return decodedString;
}

But for some reason it worked for UTF-7 but didn’t for KOI8-R.

I searched Google for hours, but couldn’t find a solution. Suddenly, I found a solution in MailCore CTBareAttachment class method, called decodedFilename.

This is ObjC code of that method:

-(NSString*)decodedName {
    return MailCoreDecodeMIMEPhrase((char *)[self.name UTF8String]);
}

This is a C language method, that converts char* to the NSString with a proper encoding.

NSString *MailCoreDecodeMIMEPhrase(char *data) {
    int err;
    size_t currToken = 0;
    char *decodedSubject;
    NSString *result;
    if (*data != '\0') {
        err = mailmime_encoded_phrase_parse(DEST_CHARSET, data, strlen(data),
                                            &currToken, DEST_CHARSET, &decodedSubject);
        if (err != MAILIMF_NO_ERROR) {
            if (decodedSubject == NULL)
                free(decodedSubject);
            return nil;
        }
    } else {
        return @"";
    }
    result = [NSString stringWithCString:decodedSubject encoding:NSUTF8StringEncoding];
    free(decodedSubject);
    return result;
}

Here

#define DEST_CHARSET "UTF-8"

and mailmime_encoded_phrase_parse is a pretty complex method, that is not easy even to copy paste here. So, this problem seems to have no easy solution, except using this C method. Actually this method parses a phrase and calls decoding C method for each word. And I ended up using it.

Leave a Reply

Your email address will not be published. Required fields are marked *