this system was created by the previous developer.
The idea was to save formatted html for each news article directly to the database, which is then displayed on the article table.
this idea worked fine, but unfortunatly people are using dreamweaver, ms word, and the like to create this html.
Needless to say, nearly 25,000 articles in my database by now, the article table alone grew to about 650MB (80% of the database!).
So I'm going to create a new component for them to create articles.
but, while doing so, I was wondering if there is something one can do, to loop through the database and "clean up" the html or something?
Worse case scenario, i will just leave the existing articles as is, and new articles will have simple html, with a single stylesheet for all articles.
Just would have like to clean up existing articles. That messy htlm, and huge record are crippling the search functionality.You will need to know what sort of 'extra' features these web editors add. You can then use a regex search to go through all the records and remove them (without removing the innertext). You could use SQL for this by directly manipulating the database.
Don't forget to try it on a backup first.
No comments:
Post a Comment