×
Namespaces

Variants
Actions

Talk:Stegafoto: a lens which embeds audio and text inside images

From Nokia Developer Wiki
Jump to: navigation, search

Contents

Hamishwillee - Subedited/Reviewed

Hi vnuckcha

Thank you for this extremely fun article- - the concept is new to me. I think the way you structured it is quite good - I barely glanced at the code and have a fair idea how it works.

I have given this a very minor review for wiki style (could do a bit more, but holding off for now):

  1. Made the abstract a single short sentence at top - makes it easier for people to decide if they wanted to read or not
  2. Moved most of the rest into a slightly more concise introduction. I think it is more readable now, please check you are happy with the changes
  3. Added SDK and platform to the ArticleMetaData - could you confirm what device you tested this on?
  4. Changed the wording slightly from "loss free" to "does not visibly distort or " for the image. There is a loss, it just isn't visible to the naked eye because its only one bit for each channel.

Two "general" suggestions

  1. Would it be possible to add a zip containing your WP project with this code - that way other people can test this out themselves without having to worry about copy-paste errors
  2. Can you please update your profile (vnuckcha ) to say a little about you and them make it public

In terms of the article I think it is both interesting and useful (the technique was new to me). Two things are slightly incorrect

  1. " Once this case is understood, the audio part will be explained on top of that " - it isn't explained, though I think you could do this in a few seconds
  2. In the image with before and after the numbers in the boxes are the same - ie you can't see which bit has changed
  3. Are you saying that originally we have change as below. That confuses me a bit, p1 appears to have changed just the most significant byte while p2 appears to have changed the least significant.
  4. p1 = {255, 26, 35, 2} and p2 = {255, 26, 34, 3}.
  5. p1 = {254, 25, 34, 2} and p2 = {255, 26, 34, 2}.

Thoughts?

Regards

Hamish

hamishwillee 08:01, 31 January 2013 (EET)

Vnuckcha - In reply to the above comment

Hi Hamish,

Thanks for re-arranging the structure of the document so that it meets the Wiki's styling requirement. Please find below answers to your questions:

Q: could you confirm what device you tested this on? A: I tested the Lens on Lumia 920 and 820 and the WebGL on Firefox 18 and Chrome 24

Q: Would it be possible to add a zip containing your WP project with this code? A: Yes, i have updated the source code for both the Lens (Windows Phone Project) and the Viewer (Javascript + HTML). Remember that the code is fragile (e.g. do not record long sentences as it would crash the app) and is provided as is.

Q: Can you please update your profile (vnuckcha ) to say a little about you and them make it public? A: I have made my profile public but do not know what to say about myself that is of interest.

Some responses to some of the above comments: C: "Once this case is understood, the audio part will be explained on top of that " - it isn't explained, though I think you could do this in a few seconds R: I actually explain it at the bottom of that section. Ref:

   "Now, when it comes to recording an audio (as in the above video), the process is the same except that we have to convert the captured sound from the microphone into a string and this is done in the following manner:
   Get the recorded sound bite as PCM data and apply the relevant header in order to convert it to WAV. The algorithm for converting PCM to WAV can be found here.
   Next take WAV data as a sequence of bytes and convert it to Base64 encoding in order to get a string representation.
   Take the string and pass it to the convert-to-binary method mentioned above." 

C: In the image with before and after the numbers in the boxes are the same R: You are correct here. I updated the illustration accordingly. Thanks !

C: That confuses me a bit, p1 appears to have changed just the most significant byte while p2 appears to have changed the least significant. R: You are confused because you are assuming that there is a most and least significant bye. Allow me to explain:

    In a pixel (which is a sequence of 4 bytes), there is no gradation property on these bytes. Each byte correspond to a channel: The lleft-most one, which you are referring as most significant is for representing the transparency level at that pixel is known as Alpha. While the right-most one is known as Blue and it is responsible for representing the amount of blue color in that pixel. As you may realize each channel is represented by 1 byte (and again they are all "equal" because they do distinct things.). Now because we are talking about a byte, we are referring to a sequence of 8 bits where there is positional order. This means that when we examine a byte, there is a most-significant and a least-significant. The least-significant bit corresponds to the integer 1 while the most-significant corresponds to the integer 128. For example, if i give you the sequence 10000001 i am representing the integer 129. Therefore when i change the least-significant bit, i am effectively making the least possible change in the integer-representation of that bit. For example, if i change the least significant bit for 10000001, i get 10000000 which is 128 (129 - 128 = 1). Hopefully, this helps clarify things. I have also added this explanation in the article if there would be similar confusion.

Thanks again for the comments and corrections :)

Vik

vnuckcha 09:49, 31 January 2013 (EET)

Vnuckcha - Comment output is not good

Hi Hamish,

I am unable to edit my comments above so that they are readable. It would be great if you could correct the formatting exception that is creating the above mess (horizontal scrollbar).

Thanks in advance.

vnuckcha 10:09, 31 January 2013 (EET)

Hamishwillee - Further subedit - needs a new review

Hi Vik

Thanks very much!

I now understand this better, but I thing the "simple" explanation is too complicated. The main problem was that we talk about 1s and 0s and odds and evens and switching values.

What I think is actually happening is that you're just replacing the Least Significant bit of each channel with your data to be encoded (ie not "flipping") (of course by replacing the value might not actually change). Is that correct? If not, then I have to say, I'm still confused.

Its also not clear to me how you know you've reached the end of your encoded content (ie "hi"?) or that there is encoded content in a particular image?

I have reworded this below (ie in this comment). Can you check that it looks OK. This will need a little tidying but I think it is a better way of explaining what is going on. Even if I'm wrong in my explanation, I think this "structure" is better and should be adopted for the section rather than algorithm steps.

I haven't bothered to change your comments yet because I can read them, and I'll delete them when we're finished in total. If you want to do this you press the admin link

Note, I also refer to the process as embedding now.

Regards H

Embedding the data

There are two parts to embedding the data:

  • Converting the data to be stored into a binary sequence (a sequence of '0' and '1')
  • Encoding the data into the image

Converting data to a binary sequence

Converting text data into a binary sequence is easy - we simply use some unique sequence of '0' and '1' to represent each character, and then string the sequences together. For example, using the ASCII encoding 'H' is assigned to 72 decimal (which is 01001000 as a base 2 number stored as 8-bits in a computer) and 'i' is assigned to 105 in decimal (01101001 in binary); "Hi" can therefore be represented as the binary sequence 0100100001101001.

Note.pngNote: There are numerous encoding schemes for defining what sequence is used to represent each character. In this example we will use ASCII because it is very simple to understand and implement.

Recording audio (as in the above video) is the same process, except that we have to convert the captured sound from the microphone into a binary sequence. This is done in the following manner:

  1. Get the recorded sound bite as PCM data and apply the relevant header in order to convert it to WAV. The algorithm for converting PCM to WAV can be found here.
  2. Next take WAV data as a sequence of bytes and convert it to Base64 encoding in order to get a string representation.
  3. Take the string and pass it to the convert-to-binary method mentioned above.

Encoding data into the image

Encoding data into the image is only a little more complicated.

An image is constructed from a huge number of coloured dots or "pixels". Using the ARGB colour model, each pixel is made up of four bytes (a byte is 8 bits), which contain the values the Alpha (opacity/transparency), Red, Green and Blue "channels" respectively (these channels define the final appearance of the pixel.)

As each of the channels has 8 bits it can represent a value from 0 to 255 in decimal (00000000 to 11111111 in binary). The right-most "bit" is called the least significant bit; if this bit is changed, the decimal value of the channel value will change by one (for example 11111111 = 255 to 11111110 = 254).

To encode our data we'll we'll set the least significant bit of each of the channels with the data we want to encode. This may not change the value of a particular bit (if it is already the same as the data bit), but even if it does such a change will be virtually undetectable to the human eye.

As each pixel has four channels it can store four bits of data. This means that we'll need 2 pixels to store the 8 bits for every character, and 4 pixels for our "Hi" string (16 bits).

<image here maybe>

If we use a 640 x 480 image resolution, then we can store (640 x 480 / 2 =) 153600 bytes of data (characters). This corresponds to about 7 seconds of audio based on approach used in the previous section.

hamishwillee 07:09, 1 February 2013 (EET)

Vnuckcha -

Hi Hamish,

I rewrote the encoding section completely by avoiding to explain certain things. I think that it abstracts things even more (e.g. no talk of HEX and ASCII). I also abstracted the explanation about the odd-even business and replaced the illustrations with another one. All in all, i hope that the article is more comprehensive while retaining its essence.

Hear from you.

Vik

vnuckcha 13:58, 5 February 2013 (EET)

Pooja 1650 - Nice article!

Hello Vnuckcha,

Your article is very interesting. The way you described the things is also good.

Keep it up!

Thanks,

Pooja

pooja_1650 14:53, 5 February 2013 (EET)

Vnuckcha -

Thanks Pooja!

Kind Regards,

Vik

vnuckcha 14:02, 7 February 2013 (EET)

Hamishwillee - This is much better

Hi Vik

Yes, this is pretty good - much better than it was. Thank you.

There is probably a bit more that could be done to subedit - will try find time later today (but if not it is perfectly acceptable).

Regards

Hamsih

hamishwillee 07:33, 11 February 2013 (EET)

Aakash95 - excellent wiki

excellent wiki here everyone, thank you!

Aakash95 04:33, 20 February 2013 (EET)

Aady - Very Nice Article

Loved the concept & article !!!! Good one :)

Regards,

Aady

Aady 12:57, 22 February 2013 (EET)

Yan -

Hi. Fun article. I thinks, you should explain why you png instate of JPG.

Note the greenish tint on the image comes from the 3rd party PNG encoder that i am using and not from the data fusion process.
I'm not sure. For me is only a monitor display problem. If imagetools change your RGB value, your process will not work.

yan_ 17:52, 22 February 2013 (EET)

Vnuckcha - Thanks

Thanks Aakash, Aady and Yan for your kind words. I am glad you found the article interesting and fun :)

@Yan - I did not want to explain why i chose PNG to JPG in the main article as it adds to the length of the article and also it is a side issue to the main purpose of the article. So, I used PNG instead of JPG as it guaranteed that the pixels changes i made would be preserved. JPG has a convoluted algorithm where the pixels values can change very slightly and that is enough to destroy the data in the prototype i was making. Of course one could device a more resilient algorithm for storing data and i left that as an exercise to the reader.

Thanks again to all three of you for your kind words.

vnuckcha 08:08, 25 February 2013 (EET)

Yan -

Hi. I know the diference between png and jpeg. But it's not he same things for all Reader.

I thinks tour explanation is good and could be added in your article ;)

yan_ 08:45, 25 February 2013 (EET)

Vnuckcha - Changes made :)

Hi Yan,

I added that explanation to the article. Thanks for the feedback.

Cheers

vnuckcha 09:23, 25 February 2013 (EET)

 

Was this page helpful?

Your feedback about this content is important. Let us know what you think.

 

Thank you!

We appreciate your feedback.

×