I recently came across a problem while crawling a BCS data source in SharePoint 2010. Most of the 130,000 records were getting indexed, but I was getting an error on 25 records:
Error while crawling LOB contents. ( ‘.’, hexadecimal value 0x00, is an invalid character. Line 25, position 10. )
When I first took a look at the offending records in the database I couldn’t see any actual problems. The issue was that the null character in the data was being treated as a string terminator, so it was not being returned with a standard select query. I was finally able to detect the null values by casting the suspected offending column to varbinary:
SELECT CAST(TheColumn AS varbinary(max)) FROM TheTable WHERE TheID = 123
Casting to varbinary returns the character values in the column as hexadecimal. Once the values were converted to hexadecimal I could see the null values represented as 0000 (In my case the offending column was nvarchar, if your column is varchar you’ll be looking for 00).
Now that I could see the problem, I needed a way to quickly and easily remove the characters from any records that contained them in the database. I found a post on StackOverflow that finally pointed me in the right direction:
http://stackoverflow.com/questions/2828333/what-is-the-null-character-literal-in-tsql
Simply searching for char(0) didn’t return any values in my case. However if the data is cast from nvarchar to varchar and the collation is set to ‘SQL_Latin1_General_CP1_CI_AS’ it does work. Note that the cast to varchar was required for the null character to be detected. Luckily in my case the offending records didn’t contain Unicode characters. Here’s what I finally ended up with to search for and remove the null characters:
UPDATE TheTable SET TheColumn = REPLACE(CAST(TheColumn as varchar(max)) COLLATE SQL_Latin1_General_CP1_CI_AS, CHAR(0), '') WHERE CAST(TheColumn as varchar(max)) COLLATE SQL_Latin1_General_CP1_CI_AS like '%' + CHAR(0) + '%'
Leave a Reply