Patterico's Pontifications

8/16/2009

What Does “Page Info” Mean?

Filed under: Blogging Matters — DRJ @ 5:32 pm



[Guest post by DRJ]

I’m working on something where I need to verify what “Page Info” means on a webpage. I think it involves the properties of the webpage — the language used, the URL address, the date created or modified, and other information. I’m most interested in what it says about the date.

If I right-click to get the Page Info for a webpage and it tells me the “Date Last Modified” is August 16, 2009, doesn’t that mean the webpage was created or last modified on August 16, 2009? If that’s true, is this always correct or are there exceptions?

— DRJ

41 Responses to “What Does “Page Info” Mean?”

  1. No, DRJ, that means that the webpage is claiming that it was last modified then. Its a piece of data content from the server itself that is not verified.

    SPQR (26be8b)

  2. SPQR,

    I assume you are saying the date can be manipulated on the webpage. Would it be true that only the webpage’s server can encode the Page Info date, so only people with access to the webpage’s server can change that date?

    DRJ (d8773e)

  3. If that’s true, would it also be true that the only way the Page Info date would be inaccurate is if the server’s date was set incorrectly or someone with Admin privileges changed the date?

    DRJ (d8773e)

  4. I treat that tag as meaningless; some people write code to modify a page so that it always appears “up to date”, even though the only thing that’s changed in the last three years is the last date accessed — which they display as “Date last modified”; it was, indeed, last modified then. I have an opinion of this practice which I think you can deduce.

    htom (412a17)

  5. It means Info doesn’t want to give you his cell number.

    happyfeet (d8cd81)

  6. DRJ, re: your #2 and #3 – mostly true but you can’t guarantee it. If its an evidentiary issue, its a problem.

    SPQR (26be8b)

  7. SPQR,

    Assume the Page Info indicates a webpage posted information on August 8 that wasn’t available to the general public until August 10. I realize this would not be conclusive evidence but if I wanted to argue someone had this information on August 8, wouldn’t this webpage’s Page Info be some evidence of that fact?

    DRJ (d8773e)

  8. Well, it tells me that I have visited this page 552 times.

    MIke K (addb13)

  9. Don’t trust that in the least, DRJ. I just right-clicked on pictures I took digitally and posted on 8/9 and Properties told me that they were both created and modified on 8/16.

    nk (544046)

  10. DRJ, if you got past the hearsay issue, it would be evidence that the information was available on August 8th. Its weight could be argued if the opposing side was competent … 😉

    SPQR (26be8b)

  11. #7 DRJ:

    Assume the Page Info indicates a webpage posted information on August 8 that wasn’t available to the general public until August 10. I realize this would not be conclusive evidence but if I wanted to argue someone had this information on August 8, wouldn’t this webpage’s Page Info be some evidence of that fact?

    Not in the least.

    EW1(SG) (edc268)

  12. The general rule is that “date last modified” refers to the date on which the document in question was last modified. The flip side is that there’s a hacker exception to everything.

    Xrlq (62cad4)

  13. The flip side is that there’s a hacker exception to everything.

    Or programming. Nobody has hacked my site. But my Properties exist only for you when you click on them.

    nk (aa7de7)

  14. That’s what I think, SPQR, but I don’t think EW(1)SG agrees. Why not, EW(1)SG?

    DRJ (d8773e)

  15. When I right-click your blog’s main page (using IE8), the “properties” say it was created and modified on August 2, 2009.

    When I do the same after clicking the comments link to go to this particular entry, the dates are not given at all.

    Beats me.

    Micajah (918bfe)

  16. Depends on whether you’re dealing with a blogging platform or not. For example, the page info on my blog’s homepage changes with every post to the date I posted . . . even on pages lower in my domain. (Likely because it’s a dynamic software and people regularly comment, which means my sidebar is modified.) That said, I can temporarily change the date of a particular post by changing the header, so long as no one comments, and I can turn off comments, so I could make it look like I posted whenever I wanted.

    SEK (9e7eee)

  17. First rule of network data: Unless a program you wrote, running on a machine only you have access to wrote it, don’t believe it if doing so puts you or something valuable to you in jeopardy.

    Second rule of network data: If the source certifies or otherwise guarantees the validity of the data, see “First rule” above.

    Larry Sheldon (86b2e1)

  18. SEK is a racist tool of Teh.Patriarchy.

    JD (848ef6)

  19. DRJ, you did have an investigator create screen shots so that you can have someone testify?

    SPQR (26be8b)

  20. There are screen shots but it’s not for a legal matter. I’m curious about it as a blogger, so I can determine whether it’s fair to believe a post with Page Info dated August 8 was really posted on that date.

    DRJ (d8773e)

  21. Its fair enough until someone proves otherwise.

    SPQR (26be8b)

  22. And to prove it, JD, I’ll write a post on 10 September 2001 that predicts 9/11 . . . and my metadata will back me up!

    (Also, random much?)

    SEK (9e7eee)

  23. Random doesn’t scale I don’t think. It would be nice if it did so people could pace themselves.

    happyfeet (d8cd81)

  24. Hi, DRJ. The date-last-modified thing is part of HTTP, the HyperText Transport Protocol and it is optional. If you use (for instance) Firefox, the Page Info for a page with no Last-Modified: information just shows the date and time at which your browser fetched the web page.

    A Last-Modified timestamp more than (say) an hour before you browsed to that page is a strong indication (but not, alas, proof) that the web page was indeed last modified then … at least, according to the server computer’s clock. Also, that timestamp applies to the whole page. If the page has dynamic sections (eg., number(s) of comments, list of recent posts), you can deduce that the static part of the page was probably last changed before or at that time, but the Page Info cannot tell you any more than that.

    (Additional OT info: Some blog software constructs each page from scratch when a browser requests it. Better systems, like WordPress here, “cache” the page after each change — for instance, when a comment is published — and send the cached copy to browsers. In this case, the Last-Modified timestamp will probably be meaningful … but, again, there are no 100% guarantees.)

    CChittleborough (34e482)

  25. I want to be CChittleborough when I grow up.

    happyfeet (d8cd81)

  26. And to prove it, JD, I’ll write a post on 10 September 2001 that predicts 9/11 . . . and my metadata will back me up!

    Be sure to blame whitey! 🙂

    Scott Jacobs (d027b8)

  27. What frightens me is that in a few years, I will be…

    🙂

    Scott Jacobs (d027b8)

  28. CChittleborough,

    I want to fairly analyze some Page Info data and your comment has been very helpful. Here is my basic question: Assume I view a webpage’s Page Info on August 16, 2009, and it tells me the webpage was last modified August 8, 2009. Is it be fair to assume, at least until proven wrong, that what appears at the webpage on August 16, 2009, was also there on August 8, 2009?

    DRJ (d8773e)

  29. Isn’t that what I said ? hehe 😉

    SPQR (26be8b)

  30. Yes, I think it is.

    DRJ (d8773e)

  31. Comment by SPQR — 8/16/2009 @ 9:27 pm

    Yeah, but CC said it like he/she (sorry, no idea what your gender is) knew what he/she (sorry again) was talking about…

    We only barely pay attention to you… 🙂

    Scott Jacobs (d027b8)

  32. It sounds to me like DRJ is safe to assume the web page in question was created as of August 8 — unless someone had an incentive to create it later and backdate it to make it *appear* that it had been created as early as August 8.

    If the Web site creator’s incentive (after the fact) would be to deny early creation of content (and I suspect that is the case), we are looking at something resembling an admission against interest. I think DRJ is safe to post and to let the content creators mount their argument that the content did not exist as of August 8. It sounds like it will be an unconvincing argument.

    Patterico (77cc20)

  33. Depends. (grin)

    Lots of websites, like yours, are database-driven. Those tend to have a “last-modified date/time” as part of the data structure that holds each page in the database. Updating that is a db function.

    mojo (74ba73)

  34. Yes, DRJ, seeing a timestamp of August 8 on August 16 does strongly suggest the page has not changed since August 8.

    And yeah, SPQR did make the same point as me, but adding all that extra detail creates at least an appearance of authority …

    Patterico’s point about resembling an admission against interest is more important than what I wrote. An expert could easily change timestamps around, but they’d need a motive.

    BTW, the C is for Christopher– so I go by Chris to save time. I’m a 50+ computer geek living in rural Australia.

    CChittleborough (34e482)

  35. Welcome aboard, Chris! Hope ya stick around.

    Have an Irish friend who moved to – I believe – Melbourne. She absolutely loves it.

    Scott Jacobs (d027b8)

  36. #14 DRJ:

    That’s what I think, SPQR, but I don’t think EW(1)SG agrees. Why not, EW(1)SG?

    Sorry to take so long to respond…was just loading a black page.

    As Xrlq notes at #12, there is a hacker exception to everything, and anything passed over the internet is subject to interception and manipulation. In particular, things like datestamps are relatively easily manipulated in web content.

    Its a slightly different matter if you have the computer in hand where the document was generated, but even so, there is a formidable forensic hurdle to mount in order to be able to say “this data was here before it should have been.”

    EW1(SG) (edc268)

  37. Now that I have had time to finish perusing the thread, as SPQR and Chris Chittleborough point out (and Patterico nicely summarizes) it is fair to assume that a random page off the Internet showing the date data as you describe is likely “honest.”

    How long it stays that way, after interest in it has been shown is anybody’s guess.

    EW1(SG) (edc268)

  38. Which is why God invented Screen Caps. 🙂

    Scott Jacobs (218307)

  39. Chris, I was joking. You made a fine explanation.

    SPQR (5811e9)

  40. To add a bit to this, the date you see is a header field sent to your browser by the web server. The primary use, these days, of the header is to help browsers and browser proxies to make decision on caching issues – your browser (and any caching proxies that may be upstream of you) don’t normally make requests for information it has seen recently.

    By default, for static pages, Apache (the most widely used web server) will send the modification date of the file. For dynamic content (including many blogging applications, in many configurations), the control of the header is under the control of the software generating the page. Much of the time, “well behaved” software will approximate the default behavior of apache, but there are times it won’t (when it wants to defeat caching, for instance), and in any case, the person writing the software can make it say whatever they want. I’ve seen a lot of games played with the header for all sorts of reasons, just like every other untrusted HTTP header (or untrusted communications in just about every protocol, for that matter).

    For the spec, if you’re interested, see here.

    fishbane (3a4837)

  41. Principle #1: Computers can be told to lie.

    If someone has a reason to falsify the dates (creation and last-modified) for a web page, they can, so said dates cannot be relied upon.

    If you want to prove a page with certain content was available on a given date, the first thing to do is check search-engine’s cache (like Google) or a web archive.

    LarryD (feb78b)


Powered by WordPress.

Page loaded in: 0.1043 secs.